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CHAPTER  1 


INTRODUCTION  AND  SUMMARY 


This  dissertation  is  concerned  with  a class  of  problems  in  the 
optimal  control  of  one-dimensional  diffusion  processes.  We  begin  this 
chapter  with  an  informal  description  of  our  general  problem,  paraphrasing 
the  precise  formulation  to  be  given  later. 

We  consider  a controller  who  must  employ  at  each  point  in  time 
one  of  a finite  number  of  available  control  modes  (or  actions).  Her 
choices  influence  the  evolution  of  a one-dimensional  stochastic  process 
(X(t);  t > 0)  in  the  following  way.  At  time  zero  we  are  given  an 
initial  non-negative  level  for  the  process  X and  an  initial  control 

mode.  Thereafter,  whenever  control  mode  a is  in  use,  X evolves  as 

2 

a Brownian  Motion  with  drift  u and  variance  a that  is  either  absorbed 

a a 

or  instantaneously  reflected  at  the  origin.  We  call  X( t)  the  state  of 
the  system  at  time  t.  The  controller  is  able  to  continuously  monitor 
the  process  X but  is  not  able  to  control  the  boundary  behavior  at  zero. 

There  are  costs  associated  with  the  system  as  follows.  Whenever 
mode  a is  employed  the  controller  continuously  Incurs  operational  costs 
at  rate  r_  and  linear  holding  costs  at  rate  hX( t) . In  addition,  a 

cl  " 

lump  sum  switching  cost  of  is  incurred  instantaneously  each  time 

there  is  a switch  from  mode  i to  mode  j.  Further,  in  the  case  of 
an  absorbing  barrier  there  is  a fixed  boundary  cost  R imposed  when  the 
process  X hits  the  origin.  Finally,  all  costs  are  discounted  at  a posi- 
tive interest  rate,  and  the  objective  is  to  minimize  expected  total  dis- 
counted costs  over  the  infinite  planning  horizon. 
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We  allow  a very  general  class  of  control  strategies,  where  the 
controller's  current  choice  of  action  may  depend  In  an  arbitrary  (measurable) 
way  on  past  state  observations  and  control  mode  selections.  Intuition 
suggests,  however,  that  attention  can  be  restricted  to  stationary  Markov 
policies,  where  changes  in  the  control  mode  are  dictated  by  only  the 
current  state  and  current  control  mode  in  a non-time  dependent  manner. 

After  collecting  in  Chapter  2 various  useful  preliminary  results, 
we  present  in  Chapter  3 a precise  mathematical  formulation  of  the  control 
problem  under  study.  Salient  features  in  this  development  are  the  general 
definition  of  an  admissible  strategy,  the  precise  definition  of  a stationary 
policy,  the  natural  definition  of  the  controlled  diffusion  process 
associated  with  our  admissible  strategies  and  stationary  policies,  and 
an  analytical  characterization  of  the  expected  total  discounted  costs 
under  a stationary  policy.  In  Chapter  1*  we  provide  necessary  and  suffi- 
cient conditions  for  a given  stationary  policy  to  be  optimal.  We  conjecture 
that  there  always  exists  a stationary  policy  that  is  optimal,  but  offer 
no  general  proof  at  this  time. 

The  main  results  of  this  dissertation  concern  the  control  problem 
described  above  in  the  case  of  two  available  contrbl  modes.  Optimal 
stationary  policies  and  their  associated  expected  costs  are  explicitly 
computed  for  such  problems  in  Chapter  5.  When  the  switching  costs  are  all 
zero  and  with  either  absorbtion  or  reflection,  the  optimal  policy  dictates 
choice  of  action  as  a function  of  the  current  state  of  the  system  only  and 
is  characterized  by  a single  critical  number.  One  control  mode  is  to  be 
used  whenever  the  process  X is  above  this  critical  number  z (0  < z < »), 
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and  the  other  control  mode  Is  used  If  X is  below  level  z.  The  single 
critical  number  is  shown  to  be  the  unique  solution  to  a complicated 
transcendental  equation. 


When  there  is  a positive  switching  cost  for  a change  in  the  control 
mode,  and  with  either  boundary  behavior,  there  again  is  a stationary 
policy  that  is  optimal.  But  in  this  situation,  the  optimal  policy  selects 
actions  according  to  a function  of  both  current  state  and  current  control 
mode.  This  policy  is  characterized  by  two  critical  numbers  z and  Z 
0 < z < Z < co.  One  control  mode  is  called  for  whenever  the  state  of  the 
system  is  above  Z,  and  the  other  mode  is  used  when  the  state  is  below  z. 
If  the  process  X falls  between  the  critical  numbers,  the  controller 
simply  maintains  the  control  mode  currently  in  use.  As  in  the  case  of 
zero  switching  costs,  we  are  able  to  derive  formulas  for  the  calculation 
of  these  critical  numbers. 

Our  study  is  related  to  earlier  work  by  Mandl  ( I968)  and  Pliska 
( 1973)  on  the  optimal  control  of  diffusion  processes.  The  added  feature 
of  our  formulation  is  the  inclusion  of  lump  sum  switching  costs,  whereas 
Mandl  and  Pliska  assume  that  costs  are  continuously  incurred  at  a rate 
dependent  on  both  current  state  and  current  action.  In  the  following 
regards,  however,  our  formulation  is  more  specialized  than  those  of  Mandl 
and  Pliska. 

( 1)  There  are  available  only  a finite  number  of  control  modes. 

(2)  The  infinitesimal  mean  and  infinitesimal  variance  of  our  controlled 
process  X depend  on  the  action  currently  in  use  but  not  on  the 
current  state  of  the  system. 
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(3)  Our  cost  rate  function  has  the  special  form  g(x, a)  = hx  + rfl,  when 
in  state  x and  employing  mode  a. 

In  formulating  and  analyzing  our  stochastic  control  problem,  we 
use  the  Ito  approach  to  diffusion  theory,  whereas  Mandl  and  Pliska  rely 
heavily  on  the  analytical  theory  of  diffusions  (and  general  Markov  processes) 
associated  with  Feller  (1954,  1957)  and  Dynkin  (I965).  Given  the  essential 
restriction  (2),  our  approach  yields  a natural  definition  of  admissible 
(generally  non-stationary)  strategies,  whereas  Mandl  and  Pliska  must 
confine  attention  to  stationary  policies  throughout  their  formulations. 

In  addition,  we  avoid  most  of  the  complexity  inherent  in  the  Feller-Dynkin 
approach  to  diffusions  by  throwing  all  analytical  issues  onto  Itti’s  lemma, 
a relatively  simple  sample  path  relationship. 

One  large  area  of  application  for  our  control  problem  with  reflection 
is  in  the  control  of  queueing  systems.  Consider  a service  facility  with 
infinite  waiting  room  and  a controller  who  may  choose  at  each  point  in 
time  one  of  two  different  available  servers  for  duty  in  the  system.  There 
are  no  specific  assumptions  about  the  form  of  the  interarrival  time  dis- 
tribution or  the  service  time  distributions,  except  that  the  arrival 
process  is  stationary  over  time  and  that  the  server  on  duty  can  not 
remain  idle  if  the  queue  length  is  positive.  At  any  time  the  controller's 
choice  of  server  is  dependent  only  on  the  current  queue  length  and  identity 
of  the  server  currently  on  duty.  For  a given  stationary  control  policy 
as  defined  in  Chapter  3 here,  it  is  shown  in  Rath  (I975)  that  as  a sequence 
of  these  controlled  queues  converges  to  heavy  traffic  conditions,  a 
normalized  sequence  of  the  queue  length  process  converges  weakly  to  the 
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controlled  reflected  diffusion  process  formalized  here.  (Rath's  meaning 
of  heavy  traffic  conditions  is  that  the  mean  service  rate  for  each  avail- 
able server  is  approximately  equal  to  the  mean  arrival  rate.)  He  shows 
also  that  the  accumulated  costs  without  discounting  arising  from  operating, 
holding,  and  switching  costs  likewise  converge  weakly  to  the  respective 
total  undiscounted  costs  generated  by  the  controlled  reflected  diffusion 
process.  These  results  can  be  extended  to  a cost  structure  that  includes 
discounting,  an  arbitrary  finite  number  of  available  servers,  multiple 
arrival  channels  and  multiple  service  slots,  and  thus  our  control  problem 
is  appropriate  in  studying  the  optimal  control  of  a variety  of  queueing 
systems  in  heavy  traffic. 

Diffusions  models,  in  general,  and  modified  versions  of  our  control 
problem,  in  particular,  have  also  received  attention  in  application  to 
water  reservoirs,  stochastic  cash  management,  collective  risk  theory, 
inventories  and  other  input-output  systems.  Here  we  mention  two  such 
examples.  Others  are  included  in  the  bibliography.  Puterman  (1975)  uses 
diffusion  processes  to  model  continuous  time  storage  systems  where  in 
the  language  of  our  control  problem  there  are  two  available  control  modes 
such  that  |i^  > 0 and  < 0,  linear  holding  costs,  switching  costs, 
no  operational  costs,  and  no  barrier  at  zero  (i.e.,  backlogging  is  allowed). 
There  is  also  no  discounting  of  costs  over  time,  and  under  the  objective  of 
minimizing  long-run  average  costs  he  investigates  optimizing  over  the  class  of 

our  two  critical  numbers  policies  where  z < O < Z.  Zuckerman  (1977)  studies 
a finite  capacity  water  reservoir  via  our  control  problem  where  there  is 
both  an  upper  and  lower  reflecting  barrier  and  available  two  control  modes 
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2 2 

such  that  >0,  and  a\  = a2‘  (T^*e  input  of  water 

into  the  reservoir  is  a Brownian  Motion  process  with  positive  drift 
while  the  controller  chooses  water  to  be  continuously  released  at  rate  0 
or  M.)  There  are  no  holding  costs  in  the  Zuckerman  model,  and  the 
operational  and  switching  costs  are  such  that  r^  = 0,  r^  < 0,  = 0 

and  > 0.  His  objective  is  to  minimize  discounted  costs  and  he 

allows  only  two  critical  numbers  policies  of  the  form  z = 0. 

Finally,  we  note  here  the  accounting  conventions  used  in  this 
dissertation.  Theorems,  propositions,  equations  and  definition  state- 
ments are  numbered  consecutively  within  chapters,  with  the  numbering 
starting  anew  in  each  chapter.  Within  the  same  chapter,  theorems, 
propositions,  equations,  and  definition  statements  are  referenced  simply 
by  their  number,  whereas  cross  chapter  references  also  require  a chapter 
identification. 
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CHAPTER  2 
PRELIMINARIES 

In  this  chapter  we  present  some  terminology  and  preliminary  results 
from  the  theory  of  stochastic  integration.  This  material  will  be  used 
later  in  defining  and  discussing  the  stochastic  processes  that  arise  in 
conjunction  with  our  control  problem.  The  reader  is  referred  to  McKean 
(I969),  Gihman  and  Skorohod  (1972),  and  Ash  and  Gardner  (I975)  for  back- 
ground information  on  stochastic  integrals  and  stochastic  differential 

I 

equations. 

2. 1.  A Variation  of  Itfi's  Lemma 

We  begin  by  introducing  some  notation  and  terminology  that  will 
be  continued  throughout  this  paper.  Their  relevance  will  be  seen  in 
Chapter  3 in  the  construction  of  a class  of  controlled  diffusion  processes 
and  associated  value  functions. 

Start  with  a probability  space  ( fi,  3,  P)  on  which  is  defined  a 
standard  (zero  drift  and  unit  variance)  Brownian  Motion  B = (B(t);  t > 0) 
with  B( 0)  = 0.  Let  {3t;  t > 0)  be  the  increasing  family  of  sub-a-fields 
3t  = 3{B(u);  0 < u < t)  generated  by  the  process  B.  Let  H denote  the 
set  of  functions  f : fi  x (0,oo)  -»F  such  that 

( 1)  f(u>,  t)  is  jointly  measurable  in  a>  and  t. 
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(2) 


f(.,t)  is  3t  measurable . for  each  t > 0,  and 


(3)  : / ^(^u)  du  < oo}  = 1 for  each  t > 0 

0 


Elements  of  H are  called  non-ant icipat ins  Brownian  functions,  or  just 
non-anticipating  functions  for  short.  We  can  suppress  the  cn  notation 
in  the  above,  refer  to  (f(t);  t > 0}  as  a non- anticipating  process,  and 
re-state  (3)  as 


2 

(3  ) / f ( u)  du  < * almost  surely  for  each  t > 0 . 

0 

t 

Now  recall  that  for  all  f £ H,  the  stochastic  integral  / f(u)  dB(  u) 

0 

of  f with  respect  to  the  Brownian  Motion  B is  well  defined  for  each 
t > 0.  The  stochastic  process  1^  defined  by 


t 

If(  t)  = / f(u)  dB(u)  , t > 0 , 

0 


is  almost  surely  (a.s.)  continuous  in  t.  (Hereafter,  we  shall  simply 
say  that  1^  is  a continuous  process.)  Furthermore,  if 


/ E[f^(u)J  du  <»  , t > ° , 

0 

then  we  have 

o ^ o 

(b)  E[(If(t  ))*]  = J E[r(u)]du  and  E[If(t)]=0,  t>0. 


I 
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We  shall  call  a process  {X(t)j  t > 0)  an  Ito  process  if  it  is 


of  the  form 

t t 

(5)  X(t)  = X(0)  + / n(u)  du  + / a(u)  dB(u)  t > 0 

0 0 

where  X(0)  is  a random  variable  measurable  with  respect  to  3Q,  and  jj. 
and  a belong  to  H.  The  first  integral  on  the  right  hand  side  of  (5), 
called  the  drift  component  of  process  X,  is  defined  for  almost  all 
(jo  £ fl  as  a Lebesgue  integral,  so  it  is  continuous  as  a function  of  t. 

The  stochastic  integral  in  (5)  is  called  the  diffusion  component  of  X. 
Thus,  X is  a continuous  process  and  X(  t)  is  3t  measurable  for  each 
t > 0. 

The  following  is  a statement  of  ItQ's  lemma,  which  is  proved  on 
pages  24-25  of  Gihman  and  Skorohod  ( 1972) . 

Theorem  1.  Let  X be  an  Ito  process  and  f : [0,<»)  xR  ->1R,  a continuous 
function  with  continuous  partial  derivatives  ft(t,x),  f^( t,x)  and 
fxx(t,x).  Then  the  process  {f(t,  X(t));  t > 0}  is  an  Ito  process 
satisfying 

1 1 P 

(6)  f(t,  X(  t) ) = f (0,  X(0) ) + / ffu(u,  X(  u) ) f^u,  X(  u)  ) c (u) 

+ fx(u,  X(  u) ) n(  u)  ] du 
t 

+ / fx(  u,  X(  u) ) a(  u)  dB(  u)  , t > 0 . 
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We  would  like  to  extend  Ito's  lemma  to  a broader  class  of  functions 

2 

f and  a broader  class  of  processes  X.  For  S an  Interval  in  IB,  C (S) 

Is  the  conventional  notation  for  the  set  of  twice  continuously  differentiable 

•*  2 

functions  on  S.  Let  C*(S)  denote  the  set  of  continuously  differentiable 
functions  f : S ->K  such  that  f"  exists  and  is  continuous  at  all  but 
a finite  number  of  points  in  each  finite  interval  of  S,  and  such  that 
the  left  and  right  second  derivatives  f"(x-)  and  f"(x+)  exist  and 
are  finite  everywhere  in  S.  The  next  theorem  is  the  extension  of  ltd's 
lemma  that  we  will  need  later. 

Theorem  2.  Let  X and  Y be  a pair  of  processes  satisfying 

t 

(7)  X(t)  * X(0)  + / a(  u)  dB(u)  + Y(t) 

0 

for  each  t > 0,  where  X(0)  is  measurable,  a is  bounded  and  non- 

anticipating,  and  Y is  a continuous  non- anticipating  process  of  bounded 
variation.  Let  f:[0,<»]  x®  -» R be  such  that 

(8)  f(t,*)  € C*(R)  for  each  t > 0,  and 

(9)  f(*,x)  is  continuously  differentiable  on  [0,®) 

for  each  x € R. 

Then  the  process  { f ( t , X(t));  t > 0}  satisfies 
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12 

(8)  f(t,  X(  t) ) = f(0,  X(0))  + / [fu(u,  X(  u) ) fxx(u,  X(  u) ) a ( u)]du 

t 

+ / f (u,  X(  u) ) a(u)  dB(  u) 

0 x 

t 

+ / f ( u X(  u) ) dY ( u)  for  each  t > 0 . 

0 x 

Remark.  In  familiar  fashion,  the  process  Y may  simply  have  the  form 

t 

(11)  Y(t)  = / n(u)  du  , t > 0 , 

0 

where  (i  is  bounded  and  non-anticipating.  In  this  case  one  clearly 
substitutes  p(u)du  for  dY(u)  in  (10).  Our  principal  interest,  however, 
will  be  in  processes  Y which  are  continuous  and  of  bounded  variation 
but  not  absolutely  continuous.  (In  general,  our  processes  Y will  be 
the  sum  of  an  absolutely  continuous  part  of  the  form  ( 11)  and  a continuous 
part  of  bounded  variation.)  Such  processes  arise  in  the  representation 
of  diffusions  with  reflecting  barriers.  Note  that  X defined  by  (7) 
above  is  not  then  an  Ito  process  according  to  our  definition  (5). 

Proof.  The  proof  here  relies  heavily  on  a similar  result,  Theorem  2.2, 

found  in  Kunita  and  Watanabe  (I967).  Their  statement  of  (10)  is  for 
N 

functions  f : ]R  -+]R  with  continuous  first  and  second  order  partial 
derivatives,  as  applied  to  '"3^  t > 0)  measurable  processes  {Z(t);  t > 0} 
of  the  form  Z(t)  = M(t)  + R(t),  where  (M(t);  t > 0)  is  a N-dimensional 
martingale  and  (R(t);  t > 0}  is  a continuous  N-dimensional  rectifiable 
process. 
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Here  the  process  X-Y  is  a £ 3t i t > 0}  measurable  martingale. 


where  additionally 


t) 


- Y(t)  - X( t)  + Y ( t) ] 


!|*J  - {{ 


Now  suppose  that  instead  of  (8),  we  have  that 

(8’)  f(t,-)  € C2(») 


for  each  t > 0.  Then  we  can  apply  the  above  Kunita  and  Watanabe 
theorem  to  processes  X and  Y,  resulting  in  ( 10)  where  the  last 
integral  is  defined  for  almost  all  a > £ as  a Lebesgue  integral. 

Fix  t > 0.  Since  X is  continuous,  we  have  that  f (u.x)  and 
f (u,x)  are  continuous  and  bounded  on  [0, t]  x S,  where  S is  the 
finite  interval  Imin0<u<t  x(u),  maxo<u<t  X(U)J‘  The  8econd  partial 
derivative  f^fu,  x)  is  continuous  on  [0,  t]  x S except  at  a finite 
number  of  points  in  S,  while  the  left  and  right  second  partials 
fxx^u>  x")  and  fxx^u>  x+)  are  bounded  on  [0,t]  x S.  We  now  construct 
a sequence  of  functions  fn  : [0,co)  xlR  -»]R  such  that  (8’)  and  (9)  hold 
for  each  fn,  fn(t,x)  converges  to  f(t,x)  pointwise  on  [0,t]  x S,  and 
such  that  f”(u,x),  fa(u,x)  and  f”x(u,x)  converge  almost  everywhere  to 
fu(“,x),  fx(u,x)  and  *«(»,*),  respectively  on  iO,t]  x S.  Therefore, 
we  have 
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/ tf“(“,  X(U))  - fu(u,  x(u))ldu , 


C n 9 

f tfxx( u,  X(“))  - ***(“,  X(u))l  ° (u)d“  , 

t 

I [f"(u,  X( u) ) - f (u,  X(  u) ) ] o(u)  dB(u)  , 
0 

and 

/C  [fj(u,  X(u))  - f (u,  X(  u) ) ] dY(  u)  , 

0 


which  finally  gives  convergence  in  probability  of  the  right  hand  side  of 
(10')  to 

C 1 9 

f(0,  X(0))  + / [fu(u,  X(  u) ) fjut(  u,  X(  u) ) a(  u) ) ]du 

t t 

+ / fx(«,  X(u))  a(  u)  dB(u)  + f f (u,  X(  u) ) dY(u)  , 

0 0 

as  desired.  □ 


2.2.  A Simple  Stochastic  Differential  Equation 

Later  we  will  be  describing  controlled  diffusion  processes  as 
the  solutions  to  stochastic  integral  equations  of  the  form 

t t 

X(t)  = X(0)  + / n(X(u))du  + / a(X(u))  dB(  u)  for  all  t > 0 

0 0 


where  |i(*)  and  a(*)  are  piece-wise  constant.  Theorem  3 will  be  helpful 
in  discussion  of  the  existence  and  uniqueness  of  such  solutions. 


Theorem  3.  Let  0,  y } a,  P and  s be  constants  such  that  y > 0 and 
(3  > 0.  Given  X(0),  a measurable  random  variable,  there  exists  a 

unique  (in  distribution)  non-anticipating  process  X satisfying 

t 

(12)  X(t)  = X(0)  + / [ 0X{X(  u)  > s)  + aX{X(  u)  < s}]du 

0 

t 

+ / tr*{X(u)  > s]  + px{x(u)  < s) ] dB(  u)  , 

0 

for  all  t > 0 f 

where  X{g)  is  the  indicator  function  of  event  g. 


Remark.  If  X is  non-anticipating,  then  the  integrands  on  the  right  hand 
sie  of  (12)  are  non-anticipating  Brownian  functions.  Thus  the  Ito  integral 
in  (12)  is  well-defined  and  X is  an  Ito  process. 

Proof.  Letting  u and  a be  the  following  piece-wise  continuous 
functions  on  K 

a , if  x < s 

u(x)  = 

6 , if  x > s 

and 
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a(x)  = 


if  x < s 
if  x > s , 


we  begin  by  applying  a standard  random  time  substitution  and  state 
transformation  to  process  B.  The  time  substitution  is  as  discussed 
on  pages  111  - 115  in  Gihman  and  Skorohod  ( I972) . That  is,  for  all 
t > 0 define  T(  t)  by 


(13) 


T(t) 

t = / 

0 


1 

62(B(u)) 


du 


> 


where  6( • ) is  a positive  piece-wise  continuous  function  (to  be  specified 
in  a moment)  such  that 


(i*0 


and 

(15) 


/ — s du  = -h»  , a.s. 

0 o ( B(  u) ) 


/ e{~  -j-1 1 du  < 00  , for  all  t > 0 . 

0 U£f(B(u))J 


Note  that  T(*)  defined  by  ( 15)  is  a strictly  increasing  and  continuous 
one-one  mapping  of  [0,»)  into  (O,oo)  such  that  T(0)  = 0.  Therefore, 
the  process  {W( t) } t > 0)  defined  by  W(  t)  = B(T(t))  satisfies 


(16) 


W(t)  = W(0) 


0 


&(W(u))  dB*(  u)  , 


for  all  t > 0 , 


16 


where  t > 0}  is  the  Brownian  Motion  defined  on  ( Slf  g,  P)  by 

T(t)  , 

(17)  B*(t  )=/  &(~b('G)7  dB(  U)  > for  all  t>0  . 

(Since  T(  t)  is  trivially  measurable  for  all  t > 0,  B*( t)  is 

also,  and  hence  W(  t)  is  likewise  properly  measurable.)  We  now  scale 
the  state  space  of  process  W by  introducing  the  following  function  on 

if  x > 0 

if  x < 0 . 

Since  g is  strictly  increasing  and  continuous  in  x its  inverse  func- 
tion f = g is  well-defined,  and  therefore  the  process  (X*(t);  t > 0) 
defined  by  X*( t)  = X(0)  + f(W(t))  is  {gt;  t > Oj  measurable.  Ito's 
lemma  now  gives  us  that 

(18)  x*(t)  = x*(0)  + / ^ f "(W(  u) ) o (W( u) )du  + / f'(W(u))  &(W(u))  dB*(u)  , 

0 0 

for  all  t > 0 , 

so  if  we  specify  & such  that  &^(W(t))  = [g'(X*(t))J^  o^(X*(t))  we  will 
have  that  X*  satisfies 

t t 

(19)  X*(t)  = X*(0)  + / p(X*(u))du  + / o(X*(u))  dB*(u),  for  all  t > 0. 

0 0 


g(x)  = 


/ exp j-  / dzi  dy  , 

0 ' 0 a ( z)  ' 


-/  expi/  dz 

0 l0  a (-z) 


} dX  , 
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Moreover,  the  boundedness  of  functions  4 and  o guarantee  that  X*  is 

2 

non-anticipating  and  that  sup  E[[X*(t))  ] < m for  all  t > 0. 

0 < t < T 

Thus  we  have  shown  the  existence  of  a non-anticipating  process  X 

satisfying  (12),  since  if  there  exists  a non-anticipating  X*  satisfying 

( I9)  for  a particular  Brownian  Motion  B*  there  then  exists  such  an  X 

for  any  Brownian  Motion  B.  This  can  be  seen  by  looking  at  F : C[0,»)  -»C[0,oo) 

a path- to- path  mapping  such  that  X*  = F(B*),  and  letting  X = F(B). 

(C[C,  a>)  denotes  the  set  of  all  continuous  functions  on  [O,00).) 

We  now  prove  the  uniqueness  ( in  distribution)  of  X by  showing 

any  solution  to  ( 12)  to  be  a diffusion  process  with  drift  coefficient 

2 

n(x)  and  diffusion  coefficient  a ( x) . That  is,  following  the  definition 
of  diffusion  in  Breiman  (I968),  we  need  verify  that  X is  a Feller  process 
such  that  for  all  e > 0 and  all  x £ F 

i)  lim  / P( t,x, t+A,dy)  = 0, 

A 1 0 |y-x| > e 


and 


ii) 


iii) 


lim 
A | 0 


lim 
A 1 0 


(y-x)  P(t,x,t+A,dy)  = p( x) , 

<€ 

< £ (y-x)2  P(t,x,t+A,dy)  = <J2(x), 


where  P(t,/.T,U)  is  the  transition  probability  function  for  X 
(0  < t < t and  U a Borel  set).  Theorem  1 on  page  67  of  Gihman  and 
Skorohod  ( I972)  demonstrates  imnediately  that  X is  a Feller  process, 
and  the  arguments  that  follow  on  pages  68-69  will  also  carry  over  here 
to  prove  i),  ii)  and  iii)  fcbove.  □ 


A 
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Note  that  if,  in  the  statement  of  the  theorem,  we  had  allowed 
y = 3 = 0,  no  stochastic  integral  would  have  been  involved.  Equation  (12) 
could  then  be  viewed  as  an  ordinary  differential  equation  to  be  solved 
for  each  cn,  but  there  is  no  guarantee  that  a solution  exists.  See 
page  78  of  McKean  ( I969) . 

We  make  note  also  of  our  belief  that  the  process  X as  a solution 
to  (12)  is  indeed  unique  up  to  a stochastic  equivalence,  although  we  did 
not  prove  so  here.  Uniqueness  in  distribution  is  all  that  we  will  need 
later. 


■M 


CHAPTER  3 


THE  GENERAL  FORMULATION 

In  this  chapter  we  formulate  precisely  a class  of  problems  involving 
the  optimal  control  of  one-dimensional  Brownian  Motion.  For  each  admissible 
control  strategy,  there  is  a corresponding  controlled  diffusion  process 
which  generates  costs  according  to  a specified  mechanism.  The  objective 
is  to  construct  an  admissible  control  strategy  which  minimizes  expected 
discounted  costs  over  an  infinite  planning  horizon. 

The  first  section  discusses  the  states  of  the  system,  the  available 
actions,  and  the  cost  structure.  Section  3.2  defines  an  admissible 
strategy  and  the  associated  controlled  process.  In  Section  3-3  we  treat 
the  expected  discounted  costs  corresponding  to  an  admissible  strategy 
and  define  optimality.  The  fourth  section  develops  stationary  policies 
and  their  expected  discounted  costs. 
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( 1)  Kii  = 0 » for  a11  1 € A , 

and 

(2)  Ki5  < Kia  + Kaj  , for  all  i,j,a  € A . 

Next,  there  is  a holding  cost  rate  h,  a boundary  cost  R and  an  Interest 

rate  a > 0.  (Note  that  the  term  "cost"  Is  rather  artificial  since  h, 

R and  each  of  the  rfl  are  unrestricted  in  sign.)  Finally,  we  are  given 
a boundary  parameter  X £ [0, 1} . 

By  way  of  interpretation,  we  imagine  a controller  who  continuously 
monitors  the  state  of  a system  (X(t);  t > 0)  and  who  must  employ  at 
each  point  in  time  t > 0 some  control  mode  a £ A.  The  state  space 
for  the  problem  is  S = [0,»)  and  the  action  space  is  the  set  A.  When 

mode  a is  in  use,  the  state  of  the  system  changes  like  a one-dimensional 

2 

Brownian  Motion  with  infinitesimal  drift  u . infinitesimal  variance  a . 

a’  a’ 

and  either  absorbtion  or  instantaneous  reflection  at  the  origin.  If 

X = 0,  then  the  barrier  at  zero  is  absorbing,  and  if  X = 1,  there  is 

instantaneous  reflection  at  the  barrier. 

There  are  costs  associated  with  the  trajectory  of  X that  are 

dependent  on  the  controls  employed  and  the  state  of  the  system.  First, 

costs  are  continuously  incurred  at  a rate  hx  + r whenever  the  state 

a 

of  the  system  is  x > 0 and  control  mode  a is  in  use,  and  at  a rate 
( 1-X)  C®  when  the  state  of  the  system  is  zero.  Additionally,  the  lump 
sum  cost  is  incurred  instantaneously  whenever  there  is  a change 

from  control  mode  i to  mode  j.  And  finally,  all  costs  are  discounted 
at  interest  rate  a > 0.  (Thus,  a cost  C incurred  at  time  t is 
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-QJt 

equivalent  to  a cost  of  Ce  incurred  at  time  zero.)  These  interpreta- 
tions for  the  data  of  the  problem  will  be  justified  by  the  formal  defini- 
tions of  the  next  two  sections. 

Based  on  the  switching  costs  we  can  divide  up  the  action  space 

into  disjoint  equivalence  classes.  For  each  action  a £ A,  let 

0(a)  = (j  € A : K = K = 0}.  Thus  0(a)  is  the  set  of  all  control 
aJ  Ja 

modes  that  can  be  reached  from  a and  from  which  a can  be  reached 

without  switching  costs.  From  (2)  it  follows  that  for  any  a and  i 

in  A,  either  the  sets  0(a)  and  0( i)  are  identical  or  have  an  empty 

intersection.  Therefore,  there  exist  M disjoint  equivalence  classes, 

M 

denoted  Ag,  . ..,  A^  where  U Ak  = A and  A.  = U 0(a)  = 0 0(a) 

k=  1 a € A,  a £ A. 

k k 

for  each  k £ { 1,  2,  . . . , M) . It  costs  nothing  to  switch  in  either 
direction  among  actions  in  any  one  equivalence  class,  and  there  is  a 
positive  cost  to  switching  in  some  direction  among  actions  of  different 
equivalence  classes.  Furthermore,  there  exist  non-negative  costs 
for  k and  i in  (1  2,  ...,  M)  representing  the  costs  of  switching 

between  equivalence  classes  that  satisfy 


(3) 

CU  = Kij  ' for  a11 

i € A.  and  all 
k 

j e V 

for 

all  k,£  £ {1,2, ...,M), 

(4) 

Ckk  = ° 

for  all 

k e {i, 

2,  ■ 

. . . , M}  , 

(5) 

Cki  - Ckm  + CmZ 

for  all 

k,  ifm  € 

Cl, 

2,  ...,  M)  . 

We  end  this  section  with  one  final  note  regarding  the  generality 
of  our  switching  costs.  Originally,  we  could  have  allowed  negative 
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switching  costs  if,  in  addition  to  (1)  and  (2),  we  had  required  that 

Ka(l)a(2)  + Ka(2)a(J)  + **•  + Ka(p-l)a(p)  - ° 

for  every  finite  sequence  a(l),  a(2),  a( p)  selected  from  A where 

a( 1)  = a(p).  The  resultant  equivalence  class  costs  then  would  likewise 
no  longer  be  necessarily  non-negative  but  would  satisfy  the  above  non- 
negative curcuit  condition  in  addition  to  (3),  (4)  and  (5).  However, 
for  the  sake  of  simplicity,  we  have  required  that  > 0 for  all 

actions  i and  j. 

3.2.  Admissible  Strategies 

As  in  Section  2.1,  we  start  with  a given  probability  space  ( fi,  3,  P), 
on  which  is  defined  a standard  (zero  drift  and  unit  variance)  Brownian 
Motion  B = lB(t);  t > 0)  with  B(0)  = 0.  Let  t > 0)  be  the 

increasing  (and  continuous)  family  of  sub-o-fields  generated  by  B.  We 
now  describe  what  is  allowed  as  a control  strategy. 

Definition.  An  admissible  strategy  is  a function  v : Q x (0,»)  -» A 
such  that 

(6)  tt(o),  t)  is  jointly  measurable  in  a>  and  t , 

(7)  Tr(',t)  is  3t  measurable  for  all  t > 0,  and 

j 
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(8)  the  function  9*  : ft  x ( 0,°o ) -»  {1,  2}  ...,  M}  defined  by 

0*(a),t)  = e(ir(ai,  t))  has  only  finitely  tnany  discontinuities 
in  each  finite  interval  of  time. 

Hereafter  we  will  suppress  the  dependence  of  ir  on  a>f  and  so 

{tt( t ) i t > 0}  is  the  process  representing  the  action  used  under  strategy 

ir  at  each  point  in  time  t > 0. 

We  now  associate  with  each  admissible  strategy  ir  and  initial 
state  X a corresponding  controlled  Brownian  Motion.  It  will  of  course 
be  necessary  to  distinguish  between  problems  with  absorbtion  (A  = 0) 
and  those  with  reflection  (A  = 1). 

Theorem  1.  Let  t be  an  admissible  strategy  and  assume  x 6 S.  There 
exist  a unique  pair  of  non-anticipating  processes  X and  Y which 
jointly  satisfy 

t t 

(9)  x(c)  = x + f du  + / dB(u)  + Y(t)  , for  all  t > 0 , 

(10)  Y(-)  is  continuous,  non-decreasing  with  Y(0)  = 0,  and  grows 

only  when  X(  t)  =0  . 

Remark.  From  (9)  and  ( 10)  it  follows  that  X is  continuous  with 
X(0)  = x.  Explicit  formulas  for  X and  Y are  given  in  the  proof 
that  follows. 
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Proof.  By  the  definition  of  admissible  strategy  and  the  boundedness  of 


and  ct^,  and  are  easily  non-anticipating.  Hence  the  process 


(Z(t)/  t > 0}  uniquely  defined  by 


t t 

Z(t)  = x + f du  + / u)  dB(u)  , 


c >o  , 


is  an  ItS'  process.  Since  Z is  continuous  we  have  that 


/ Z ( u)  du  < oo  , 
0 


almost  surely  for  each  t > 0 


and  so,  Z is  a non-anticipating  process. 
Let  (Y(t);  t > 0}  be  defined  by 


Y(t)  = [-  inf  {Z(u)}l  , 

L 0 < u < t -I 


t > 0 


Clearly  Y is  continuous,  non-decreasing  and  non-anticipating,  and 
Y(0)  = 0.  Now  let  {X(t);  t > 0}  be  such  that 


X(t)  = Z(t)  + Y(t)  , for  each  t > 0 . 


Since 


[-  inf  {Z(u)}]+  > [-Z(t)1+  , 
L 0 < u < t J 


t > 0 


X( t)  is  non-negative  for  each  t > 0,  Suppose  that  X(t)  >0.  This 
implies  that 
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-Y(t)  = inf  (Z( u) } <0  and  Z(t)  > inf  (Z(u)J  , 
0 < u < t 0 < u < t 


and  hence 


inf  (Z(u)j  = inf  (Z(u))  . 

0 < u < t 0 < u < t 

Thus  Y does  not  grow  at  t,  and  we  have  a pair  of  non- anticipating 
processes,  X and  Y,  that  satisfy  (9)  and  (10). 

Suppose  now  that  the  process  {Y(t)j  t > 0)  is  continuous, 
non-anticipating,  non-decreasing  from  ?(0)  =0  and  grows  only  if 
X(  t)  = 0,  where  X( t)  = Z(  t)  + Y(t).  Let  x be  the  Markov  time  (with 
respect  to  {3t;  t > 0})  defined  by 

r = inf {t  >0  : Y(t)  ^ Y(t)}  , 

and  suppose  that  x < +»  and  Y(x+)  <Y(x+).  Then  there  exists  a r'  > x 
such  that  Y(t)  <?(t)  on  (x,x').  Hence  Y must  grow  at  each  t G (t,x') 
implying  that  X(t)  < X(t)  = 0 on  (t, x')  which  contradicts  that  X is 
a non-negative  process.  Similarly,  for  x < +»  and  Y(  x+)  > ? ( x+) . 

It  must  be  then  that  x = +a>f  thereby  proving  the  uniqueness  of  our 
solution  to  (9)  and  (10).  That  is,  the  process  Y defined  by  (11)  is 
the  unique  minimal  process  needed  to  maintain  the  non-negativity  of 
process  X as  defined  by  (9).  n 
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Our  controlled  processes  are  now  defined  as  follows.  If  \ = 1 
( ref lection)  then  for  each  x 6 S and  each  admissible  strategy  ir  define 


process  {X(t|x,  ir) ; t > 0}  exactly  as  process  X in  Theorem  1 the 
notation  being  enriched  to  Indicate  explicitly  the  dependence  of  X 
on  7r  and  x.  When  = 0 ( absorbtion) , we  define  {X(t!x,  tt);  t > 0) 

by  X(  t |x,  rr)  = X(  t A T)  in  Theorem  1 where 

T = inf{t  > 0 : X(t)  = 0}  . 

In  either  case  (absorbtion  or  reflection),  we  call  (X(t|x,  n) ; t > 0} 
the  controlled  process  generated  by  strategy  n and  initial  state  x. 

3.3.  Expected  Costs  and  Optimality 

Start  with  a given  admissible  strategy  t r and  initial  control 
mode  a € A.  For  each  pair  of  equivalence  classes  k and  t in 
(1,  2,  . ..,  M],  define  a counting  process  [Qk^(t|a,  ir)  > c > °}  as 
follows.  For  t > 0,  let  Q (t)|a,  ir)  be  the  supremum  of  all 

K Jo 

m € [0,  1,  2,  . . . ) such  that  there  exist  0 < < Tg  < * * • < x^  x^  < t, 

where  for  n = 1,  3,  5,  . ..,  2m- 1 we  have 


(12) 

T( Tn)  £ Ak  ’ 

(13) 

tt(u)  £ (Ak  U A^) 

for  all  u £ ( t . x , ) 
v n’  n+1' 

(14) 

tt(\  ,)  6 A£  . 
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We  now  define  7r)  by 


1 


Qki(0|a,  T)  = 1Jm0  (Wtla>  7r)  * 

(Such  a limit  exists  for  all  k and  £ in  { 1,  2,  . . . , M}  and  all 
a € Pi}  since  Q^(  • |a,  t)  is  a non-decreasing  function  of  t.) 

Clearly^  Qjc^(tla,  7r)  is  3t  measurable  for  each  t > 0,  all 
kf£  £ {1^  2}  M},  and  all  a £ A,  Note  that  if  for  any  t > 0, 

there  e*ists  k € {1,  2}  ... } M}  and  c (0,t)  such  that 

0(  u)  = k on  (TjT')j  then  Qkk(  1 1 tt)  = ».  But  for  t >0  and  all 

kf£  € { 1^  2}  ...f  M}  where  k / l}  ^ must  be  finite  by  virtue 

of  condition  (8)  in  the  definition  of  admissible  strategies.  Therefore? 
except  for  the  case  k = if  we  interpret  Qk^(t|a,  tt)  as  the  number 
of  switches  made  under  strategy  tt  from  within  equivalence  class  k 
to  within  equivalence  class  i during  the  time  Interval  [0, tj. 

Now  let  us  represent  the  continuous  costs  of  our  system  by  the 
function  g : SxA  ->B,  where 

!hx  + r if  x > 0 

( l-\)  OR  if  x = 0 

for  each  a £ A.  Recalling  that  X(*|xy  tt)  is  the  controlled  process 

generated  by  strategy  v and  initial  state  x,  we  define  a function  V 

’ IT 

on  S>^  by 
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(15) 


[g(X(t|x,  tt),  tt(  t)  Jdt 


-at 

e 


where  Q*^(  . 
process 

(16)  / 

o 


| a,  v)  is  the  counting  measure  associated  with  counting 
( • |a,  it)  and  where  by  convention 


if  CkS,  = 0 and  Qk/CK  ir) 


“ . 


Thus  V^( x, a)  represents  the  expected  total  discounted  costs  generated 
by  strategy  tt,  given  initial  state  x C S and  initial  control  mode 
a £ A.  We  shall  call  V the  value  function  for  strategy  tt. 


Theorem  2.  Let  v be  an  admissible  strategy.  Then  for  all  a € A, 
|v  (x,a)  - hx/a|  is  bounded  for  all  x € S if  and  only  if 
*1/  e dQj^g( c I a,  t)]  <®  for  all  k and  l in  {1,  2,  ...f  M) 
where  C,  . > 0. 


Proof.  Fix  (x,a)  £ SxA.  For  all  t > 0,  we  nay  bound  g(X(t]x,  tt),  rr(t)) 

by  max(hX(t|x,  tt)  + r*,  ( l-\)  OR)  where  r*  = max(rp  rg,  rN). 

00  ~cxt 

Hence  we  now  set  out  to  bound  E[  | f e g(X(t|x,  tt),  7r(t))dt  - hx/a|  ] 

by  effectively  doing  so  with  E[ |/“  e'Qt  hX(t|x,  v)dt  - hx/a|  ] . 

0 

We  restrict  attention  here  to  the  case  of  reflection,  since 

00 

E[f  e X(t|x,  -rr)dtj  is  greater  with  a reflected  process  than  with  an 
0 

absorbed  one.  Let  (Zx(t);  t > 0)  denote  unrestricted  Brownian  Motion 
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we  have  that  Wx( t)  is  equivalent  to  x + Zg( t)  + [mQ(t)  - x]  + for  all 
t > 0.  Therefore  we  get  the  following  bound  for  each  t > 0, 


Now 


E[Wx(t)]  < x + E[Z0(t)  + m0(t)]  . 

z0(  t)  + “0(t>  = Mo(t)  » 


for  all  t > 0 , 


and  if  u = 0 we  have  the  exact  calculations,  e.g.,  Karlin  and  Taylor 

(1975), 


E[M  (t)]  = -2-^  and  E[W  ( t)  ] = x + 

U , r— 1 X , • *- 

V ~ \ir 


For  u < 0,  we  note  that  MQ( t)  t MQ  as  t -+  ® and  that  MQ  is  exponentially 
distributed  with  parameter  2 1 p.  | / <? . Hence 

2 

E(Wx(t)l  < x + 2jj7[  > for  aH  t > 0 and  n < 0 . 

For  u > 0,  we  have  that  n^(  t)  t mQ  as  t -*  <*>  and 
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) 


E(  V - 24 


since  minus  Che  infinum  of  unrestricted  Brownian  Motion  with  positive 
drift  |n|,  variance  and  initial  state  zero  is  equivalent  to  the 

supremum  of  unrestricted  Brownian  Motion  with  negative  drift 

2 

variance  a f and  initial  state  zero.  Thus 


E[Wx(t)]  < x + nt  + ^ , 


for  all  t > 0 and  ^ > 0 


Returning  to  our  reflected  controlled  process,  we  have  shown  that 


E[X(  t |x,  tt)  1 < x + max  J n*t  + j , for  all  t > 0 , 


V7 


# 2 2 o 

where  a = maxfc^,  Og,  ...,  ojj},  n*  = minfmind^ | ^ 0,  i = 1,2, 

* 

and  [i  as  before.  Thereby  in  integrating,  we  can  bound 

00 

E [/  e X(t|x,  ir)dt]  by  a finite  valued  linear  function  of  x 
0 

namely. 


This  means  that  E[  | / e at  hX(t|x,  tt)  - hVd:|]  and  hence. 


E[  1/  e g( X(  1 1 x,  tt),  ir(  t)  )dt  - hx/a|  ] are  each  bounded  for  all 
0 

x G S.  It  then  follows  that  |v  (x,a)  - hx/a|  is  likewise  bounded  in 

MM® 

S,  if  and  only  if  E[  £ I / C e‘at  dQ  (t|a,  tt)  ] is.  So 


k=l  f=l  0 


our 
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proposed  condition  on  the  counting  processes  Q ( • |a  tt)  is  necessary 

and  sufficient  for  the  value  function  V (X.a)  to  be  finite  for  all 

t r ’ 

(x,a)  £ SxA,  in  which  case  |v^(x,a)|  is  bounded  by  a linear  function 
in  x for  all  a € A.  □ 

The  optimal  value  function  V*  ; SxA  -»'H  is  defined  by 

(16)  V^(x,a)  = inf  V^(x,a)  , for  all  (x,a)  € SxA  , 

v 

where  the  infinum  is  taken  over  all  admissible  strategies  tt.  Admissible 
strategy  tt  is  called  ( x.a) -optimal  if 

(17)  V^a)  = V#(x,a)  , 

and  given  initial  state  x € S and  initial  control  mode  a £ A,  our 
control  problem  is  to  construct  an  admissible  strategy  that  is  (x,a) 
optimal. 


3.U.  Stationary  Policies 

We  now  define  stationary  policies  and  show  how  each  such  policy 
generates  an  admissible  strategy  for  each  initial  state  x and  initial 
mode  a £ A. 
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Definition.  A stationary  policy  is  a function  f:SxA  -> A satisfying 

(18)  for  each  a £ A,  f(x,a)  has  finitely  many  discontinuities 
in  x € S , 

( 19)  if  \ = 0,  then  f(0,a)  = a for  each  a £ A , 

(20)  for  each  (x;a)  C S y A,  there  exists  an  e > 0 such 
that  f(y,  f(x,a))  = f(x, a)  for  all  y € (x-e,  x]  or 

for  all  y £ [x,  x+e),  unless  = 0 and  x = 0 (see  (I9))  , 


(21)  for  each  a £ A,  the  class  continuation  set 

I = (x  £ S : f( x,a)  £ 8(a))  is  an  open  subset  of  S , 


(22)  if  y is  a closed  boundary  point  of  action  continuation 

set  I = {x  £ S: f( x,a)  = a)  for  some  a £ A,  then  there 

exists  a € 0(a)  and  e > 0 such  that  f(x,a)  = a for 

all  x £ {x  £ S : | y-x 1 < e and  x £ la]  and  f(y,a)  = a, 

where  a = lim  f(s,a)  . 
s -*  y 
s t Ia 

Interpret  f as  a rule  for  selecting  actions  through  time,  the 
action  selected  at  time  t being  f(x,a)  if  the  state  of  the  system 
and  control  mode  in  use  at  that  time  are  x and  a,  respectively.  Note 
that  this  rule  for  selection  of  actions  does  not  depend  on  time,  and 
thus  the  name  stationary  policy. 
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For  each  a € A,  the  set  I defined  in  (22)  denotes  those 

states  for  which  f will  continue  with  control  mode  a.  We  call  I3 

the  action  continuation  set  associated  with  mode  a under  policy  f. 

For  each  a £ A,  the  set  I defined  in  (21)  represents  those  states 
for  which  f will  continue  with  an  action  from  equivalence  class  0(a). 

That  is,  given  mode  a in  the  k-th  equivalence  class,  the  action  f(x,a) 

selected  by  f at  state  x will  also  belong  to  A if  x is  in  I . 

K 3 

Hence  we  term  I as  the  class  continuation  set  associated  with  mode 
a ■ • — 

a under  policy  f.  From  conditions  (18)  through  (22),  we  see  that  for 

each  a £ A,  set  I is  the  union  of  a finite  number  of  intervals  open 

in  S,  and  action  continuation  set  I is  contained  in  class  continuation 

set  I . 
a 

For  example,  suppose  that  ^ = 1 (reflection  at  zero)  and  we 


have  three  control  modes  such  that  = K21  = K22  = = .v,, 

and  > 0.  We  then  have  two  equivalence  classes,  A^  = (1,2} 

and  Ag  = {3},  and  equivalence  class  switching  costs  C ^ 

C21  = Kji  = and  = C ^ = 0.  In  Figure  1 we  illustrate  a 

stationary  policy  for  this  particular  data.  This  stationary  policy  is 
given  by 


*33  = °'  K13  " *23  > °» 


f(x,l)  .. 


if 

X 

€ (0, s L ] 

if 

X 

£ (sps2^ 

if 

X 

£ (s2>3^ 

if 

X 

e t85,\] 

if 

X 

£ (%,*) 
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The  action  continuation  sets 


are  I1  = [O^]  U (b2,s  ), 


2 3 

I = (SpSg]  U (s^,»)  and  r - (8pS,_).  The  class  continuation  sets 
are  ^ = [0,8^  U (s^,-),  Ig  = [0,8^  U (s^,-)  and  = (s^), 

each  of  which  is  open  in  S.  Condition  (22)  then  needs  to  be  verified 

at  s^  (closed  boundary  point  of  1^)  and  at  s^  (closed  boundary 

2 

point  of  I ).  Indeed,  at  s,  we  have  that  lim  f(s,l)  = 2, 

i s -*  s^  J ’ 

s i I1 

f(x,2)  =2  on  (s^,Sg]  and  f(s^,2)  = 1.  At  Sg,  (22)  is  likewise 

satisfied  since  lim  f(s,2)  = 1,  f(s,l)  = 1 on  (sp>s,)  and 
s — > ^ 


f(Sp,1)  = 2. 


s i I2 


We  now  need  some  more  notation  concerning  a stationary  policy  f. 

3 3 3 3 3 

For  each  a 6 A,  let  0 = s.  < s,  < s»  < • • • < s , N < s . . , = ■»  be 
» 0—12  n(a)  n(a)+l 

the  finite  number  n(a)  of  discontinuity  points  in  f(xfa).  And  for 

all  p = 0,  1,  2,  n(a),  denote  by  b^  the  constant  value  of 

3 3 

f(  • a)  on  the  interval  between  s and  s ,,  where  we  have  not 
specified  as  to  whether  the  endpoints  of  the  interval  are  open  or  closed. 
The  following  two  results  will  allow  us  to  define  the  admissible 
strategies  generated  by  a stationary  policy  i}  and  to  characterize  the 
corresponding  values  as  the  solutions  of  certain  differential  equations. 


Theorem  3.  Let  f be  a stationary  policy  and  fix  (x^a)  € SxA.  Then 
there  exists  a unique  admissible  strategy  tt  such  that 

(23)  e(n(0+))  = 0(f(x,a))  , 


tt(0+)  = f(x,a)  except  for  x a closed  boundary  point 


of  T 


and  for  all  t > 0 


0(7T(t+))  = 0(f(X(t|x,  TT),  7t(  t) ) ) , 


(26)  Tr(t+)  = f(X(t|x,  tt),  n(t))  except  for  X(t|x,  ir) 

a boundary  point  of  I77^) 

where  X(  • |x,  7 r)  is  the  controlled  process  generated  by  strategy  7 r 
and  initial  state  x. 

Definition.  We  call  v , characterized  by  (23)  through  (26),  the  admissible 
strategy  corresponding  to  stationary  policy  f,  initial  state  x and 
initial  control  mode  a. 

Proof.  Let  a^  = f(x,a)  and  p € {0,  1,  ...,  n(a^)}  be  such  that  x 

Ql\  a-j  ai 

falls  in  the  interval  between  s and  s , and  b = a Define 

p p+1*  p 1 

(?(t);  t > 0)  to  be  the  unique  reflected  Brownian  Motion  process 


+ 

8(t>  -*<t)  +f-  inf  (ir(u)jl  , 
L 0 < u < t J 


where 


^(  t)  = x + a t+  a B(t)  for  all  t > 0 . 

a«  a.  7 ■— 


al  ai 


1 


Let  be  the  first  hitting  time  of  s^  or  sp+^  by  process 

and  note  that  x^  is  a {3^  t > 0}  measurable  stopping  time.  We 
now  begin  to  construct  process  {W( t) ; t > 0)  and  process  (Z(t);  t > 0} 
by  defining  W(t)  = a^  for  all  t £ (0,x^]  and  Z( t)  = £(t)  for  all 
t £ [O.Tj], 

al  al 

Suppose  that  |(x^)  = s and  that  s is  not  a boundary  point 

al  al 

of  I . Thus  s is  a boundary  point  of  I , and  we  shall  iirst 
a.  p > 

1 al  al  al 

consider  the  case  where  s G I . We  have  in  s . then,  a closed 

P P ’ ’ 

al 

boundary  point  of  I , and  by  (22)  there  exists  a^  £ e(a^)  and 

clp 

f(s-^p  a^)  = a^.  Now  let  (i]r(t);  t > x^}  be  the  unique  process  from 
Theorem  3 in  Chapter  2 satisfying 


(27)  *(t)  = s 1 + / [ji_  *{t(u)  < s*1]  + u X{i|r(u)  > s^1) ]du 


p t.  a2 


P a. 


+ / [ cr  X(\|r(u)  < S*1}  + a X{\|r(u)  > s J-)]  dB(  u) 

” p 3 
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and  let  {|(t);  t > t^}  be  the  unique  reflected  process 


+ 

5(t)  = >ir(t)  + C inf  {*(«.)}]  , 

Lt,  < u < t J 


C^T1 


1 - - 


al  a2  r 

Define  x^  as  the  first  hitting  time  of  s ^ or  sp  by  process  g f 
and  note  that  x^  > a.s.  and  that  x^  is  c measurable. 

Thus  define  W(t)  = a 2 X{|(t)  < s*1}  + a ^ X{£(t)  > s*1}  on  ( t^, t2] 
and  Z(t)  = i(  t)  on  (T^XgJ. 
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x al  al 

If  6(x,)  = s s is  not  a boundary  point  I but  is  a 
v 1'  P ’ P a. 

al  al  / aI  1 

boundary  point  of  I , and  s f.  I , then  there  exists  a^  6 0(a^) 

- a2  al  a2 

and  p t (0,  1,  n(a2)}  such  that  s-  = s b-  = a2  and 

ax  + P P 

f ( Sp  f a^)  = a2.  Subsequently  we  would  alter  the  definitions  of  \|r(  t) 
and  W( t)  in  the  above  paragraph  to 


(2T) 


1 +.  / [a  X{t(  u)  < s + ti  X{^(  u)  > s MdU 

5 T1  L a2  p al  P J 

t r ai  ai  l 

+ / a X{i(u)  < s } + o X{*(u)  > s ) dB(u) 

T , L a2  P al  P j 


C^T1 > 


a1  - a1 

and  W(  t)  = a0  X(|(t)  < } + a^  X{£(t)  > s^}  on  (TpT2-*  All  other 

definitions  would  be  the  same. 

al  31 

Suppose  instead  that  £(t^)  = s and  that  s is  a boundary 

al  P P 

point  of  I . Let  a = f(  s a^.  By  virtue  of  condition  (21)^ 

d P 

there  then  exists  p £ (0^  1,  n( a2) } such  that  sa*  falls  in 

a2  a2  a2 

the  interior  of  the  interval  between  s-  and  s-  . and  b-  = a_, 

p p+1  p 2> 

a a a a 

„ i.  ^ i - ^ c e-  _ \ 


or  such  that  s = s-  , , b-  = a„  and  b-  . C 0(ao). 

p p+P  p 2 Pfl  d' 

a2  al  a2 

If  s-  < Sp  < s-+j^  we  proceed  as  in  the  first  paragraph  of  this  proof. 
That  is,  define  U(t);  t > t^)  to  be  the  reflected  process 


£(t)  = t(t)  + [-  inf  {*(«))]  , 

L T,  < U < t j 


C^T1  > 


1 - “ - 


where 
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T 


1 


(30)  t(t)  = 3p  +^a^(t-T1)  + O (B(t)  - B(ti))  , t > Tj 


Letting  xg  be  the  first  hitting  time  of  s-  or  s-^  by  process 

l,  we  then  define  processes  W and  Z by  W( t)  = ap  on  (x.,xp]  and 

al  a2  2 

z(  t)  = £(t)  on  (Tj, Tgl*  Sp  = Sp+i>  we  proceed  as  in  the  above 

paragraph.  That  is,  letting  a?  = b?^  and  p*  € {0,  1,  n(aj)} 

ai 

to  be  such  that  bp£  = a^,  we  define  (|(t);  t > x^  to  be  the  reflected 
process  (f(t);  t > where 

a.  t a.  a. 

(31)  *(t)  = s + / [u  X{\|r(u)  < s J + u X{ 4r( u)  > sj 


T1  ‘2 


)du 


c a a. 

+ / to  X('lr(u)  < s ) + a X{^(  u)  > s ) ]dB(  u) 

Qa  ■”  d a_  n ' 


Tl  a2  ' ' - ? ' a3  ’ ' ' P 

t > Tj^  . 

a 

We  then  define  processes  W and  Z by  W( t)  = a^  X{|(t)  < s } 

ai 

+ xU(t)  > sp  } and  Z(  t)  = C(t)  on  (x^x g],  where  Xg  is  the 

a2  a5 

first  hitting  time  of  s-  or  s_z  . by  process  £.  In  both  cases 

p p*+l 

of  this  paragraph,  stopping  time  Tg  is  such  that  Xg  > x^  almost 
surely. 

Similarly,  we  handle  all  of  the  possibilities  where  £(x^)  = sa+^. 
Then  in  conclusion  of  the  situations  depicted  above,  we  let  x = Z( Xg) 
and  a = W( Xg-)  and  follow  the  same  procedures  all  over  again.  That 
is,  if  x is  a boundary  point  of  I~  we  return  to  the  case  discussed 
in  the  previous  paragraph,  and  if  x is  not  a boundary  point  of  I~ 


ko 


between  r and  t , is  bounded  away  from  zero  almost  surely.  This 
n n+1  J 


ensures  that  r -» +a>  almost  surely  as  n -» a>  and  we  have  that  W is 

indeed  a function  from  ((),«>)  into  the  action  space  A. 

Finally ? if  X = 1 (reflection)  we  define  process  (Tr(t);  t > 0) 

as  being  exactly  process  (W(t);  t > 0),  and  process  {X( t) ; t > 0)  as 

being  exactly  process  (Z(t);  t >0).  If  A = 0 ( absorbtion) f we  define 

{tt(  t) ; t > 0)  by  tt(  t)  = W(  t A T)  for  all  t > 0 and  {X(  t) ; t > 0) 

by  X(t)  = Z( t A T)  for  all  t > 0,  where  T = inf{t  >0  : Z( t)  =0). 

Clearly  tt(  t)  is  3t  measurable  for  all  t > 0 and  since 

(t  , - r ) is  bounded  away  from  zero  for  every  n = 1.  2.  3 the 

n+J.  n ' ’ 7 ’ 

function  &*  : (0,m)  ->  [1,2,3^  ...f  M)  define  by  &*(  t)  = 6(v(t)) 
has  (a.s.)  finitely  many  discontinuities  in  each  finite  interval  of 
time.  Therefore  tt  is  an  admissible  strategy.  By  construction  our 
process  Z is  the  solution  process  to  Theorem  1 of  this  chapter  for 
this  tt,  so  process  X is  uniquely  the  controlled  process  generated  by 


: 


strategy  tt  and  initial  state  x.  Moreover  processes  7 r and  X 
uniquely  satisfy  conditions  (23)  through  (26),  which  is  to  say  that 
processes  tt  and  X(*|x,  tt)  uniquely  satisfy  (23)  - (26),  as  desired.  □ 


Remark.  The  fact  that  (Tn+^  " Tn)  in  t'le  proof  above  is  bounded  away 
from  zero  for  every  n = 1,  2,  3,  •••  implies  that  there  exists  a 
constant  U < °o  such  that  E[Q^(t|a,  tt)  ] < Ut,  for  all  t > 0 and 
all  k, l £ (1,  2,  ...,  M}  where  k ^ l. 


Theorem  4.  Let  f be  a stationary  policy  and  for  each  (x,a)  € SxA, 

let  7r(x,a)  denote  the  admissible  strategy  uniquely  corresponding  to 

f,  x and  a.  Let  V,  : SxA-»IR  be  defined  as  V,(x,a)  = V , \(x.a) 

’ f fv  * ' 7r(x,a)v  * ' 

for  each  (x,a)  € SxA.  Then  is  the  unique  function  V : S>^ 

to  satisfy  the  following  for  each  a £ A: 


(52)  V(.,a)  £ C?(S)  , 

(33)  |V(x,a)  - is  bounded  for  all  x £ S , 

(3*0  Da  V(x,a)  - QV(x,a)  + g(x,a)  = 0 for  all  x in  the 

interior  of  I . where  D denotes  the  differential 

a 2 

► n d 1 2 d* 

operator  = ^a  3^  + 2 Ua  ^2  , 

(35)  v( x,a)  = Kajf(X)a)  + v(x,  f(x,a))  for  all  x t I®  , 

(36)  AV  '(°,a)  - (1-A)  V(0,a)  + (l-A)R  = 0 . 
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Proof.  We  begin  by  showing  that  satisfies  conditions  (32)  and  (36). 

DO  ^ 

From  the  note  following  Theorem  3 we  can  see  that  E[/  e dQ  ( t la,Tr)]  < 00 

0 

for  every  (x, a)  € SxA  and  all  distinct  k and  & in  {1}  2(  llt>  M). 

Hence  by  Theorem  2,  satisfies  (33)  for  each  a £ A. 

a 

Now  fix  a £ A,  and  note  that  I is  the  union  of  a finite 
number  of  intervals  in  S where  the  endpoints  of  each  interval  are 

di 

possibly  open  or  closed.  Consider  x £ I with  s^  and  Sp+^  being 
the  endpoints  of  the  interval  of  Ia  containing  x.  Let  [Z  (t);  t > 0} 
be  the  Brownian  Motion  starting  in  state  x with  drift  parameter 

A A 

u . variance  parameter  a and  absorbtion  at  boundaries  s and  s , . 

Ma'  a p p+1 

Associate  with  process  Z the  same  linear  holding  costs,  operational 
costs,  and  switching  costs  as  with  X(  • |x,  tt),  and  include  in  this  cost 

A A 

structure  absorbtion  costs  K c.  a x + V (s  , f ( s a))  and 

a,  f(sp,a)  f P’  P, 

Ka>f<  Vi-a)  + VVp  f( Vi>  f(Vi»  a»  at  boundary  points  sa 
and  3^  p respectively.  Let  V^(x)  denote  the  conditional  expectation 
of  the  total  discounted  cost  generated  by  Z in  this  setting.  Define 

A 

function  F : [0.T1  x [sa,  sa  J ->m  by 

A 1 > J P*  P+1 


Fa(t'x)  = e"at  VX>’  (t>x)  £ [0>T]  X [V  Sp+1]  » 


where  T = inf{t  >0  : Zj,  t)  £ ( sa,  sa+^) } . Now  since 


V*>  . ■{/ 


e-at[hZa(t)  + rjdt 
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(37) 


2 & A 

and  Z is  continuous,  we  have  that  V € C ( [s  . s ,1).  Hence  F 
a a p’  p+1' ' a 

satisfies  the  conditions  of  Theorem  2 in  Chapter  2,  (our  extension  of 

A 

Ito’s  lemma) } and  the  desired  result  here  is  that 


(38)  e'^  V (Z  (T))  = V (x)  + / [-Oe'^  V (Z  (t))  + £ e‘at  V"(ZJt))  d 


a'  a'  ' > a 


+ e"atva(Za(t))  ^a]dt 


+ / e‘at  v;(Z  (t))  a dB(t) 
0 3 a 3 


The  fact  that  V € C^([sS  s3  ,])  results  in 
a P*  p+1 


/ e-2au[v;(Za(u)))2  du  < 


thereby  leading  to 


for  all  t € [0,T]  , 


T 

b[/  e-at  V^(Za(t))  aa  dB(t)J  = 0 


We  therefore  take  expectations  in  (38)  and  see  that 


T 

(39)  E[e-OT  Va(Zfl(T))]  = Va(x)  + e[/  e-at[Da  V^ZJt))  - OV^  t) ) ]dt  ] 


But  the  expectation  on  the  left  hand  side  of  (39)  Is  exactly 


{•'a'|K.,[|Zi(!),.)  - VVT>.  £<Za<T>.  *»!]  - 
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so  combining  (37)  and  (39)  we  get  that 


T £ 

E[/  e'aC[°Va(Za(t))  - Da  va(Za(t))]dt]  = E[^  C)  + raldt]  • 


This  means  that  D V (x)  - ON  (x)  + g(x,a)  = 0 for  all  x € [sa,  sa  ,] 
a a ' a ' ox  ’ ’ p>  p+1  * 


and  since  V (x)  = V,(x,a)  on  this  interval  of  S,  we  have  that  V 
a ' z ’ ' ’ t r 

satisfies  condition  (34). 

Now  suppose  that  x £ I3.  Then  it  follows  from  the  definition 
of  value  V^(x,  a)  and  Theorem  3 that 


vf(x,a)  = Kfl  f(x  a)  + Vf(x,  f(x,a))  , 


and  so  also  satisfies  (35).  Furthermore  if  x is  a continuity 

point  of  f(-,  f(x,a)),  then  y € if(x>a)  for  all  y in  some  open 
interval  about  x.  Therefore,  as  discussed  in  the  previous  paragraph, 
V£(y,  f(x,a))  and  V^(y,  f(x, a))  exist  for  all  y in  this  interval, 
where  and  V'^  denote  the  first  and  second  partial  derivatives  with 

respect  to  the  first  argument  of  Vf.  Hence  V£(x,a)  and  V£(x,a)  both 
exist. 

We  have  thus  far  shown  V£(x,a)  and  V^(x,a)  to  exist  for  all 
x in  the  interior  of  la  and  all  x in  the  interior  of  I^x>a\ 

This  means  that  V^(x,a)  and  V^(x,a)  also  exist  if  x is  a boundary 
point  of  Suppose  now  that  x is  a closed  boundary  point  of  la. 

Again  let  sa  and  Sp+^  be  endpoints  of  the  interval  of  la 


containing  x,  and  assume  that  x = sa.  Then  there  exists  j G 0(a) 


and  p G {0,  1,  2,  ...,  n(j)}  such  that  s^+^  = sa,  bj-  = j 

U5 


and 


I 


f(8p+l>  ^ = a‘  Following  the  proof  of  Theorem  3,  let  {*(0;  t > 0) 


i 3 

be  the  unique  process  defined  for  y C ( sj^,  s ^)  as 


t)  = y + / [u.  X{t(u)  < sa}  + u X(\|r(  u)  > sa}ldu 

0 J pa  - p 


+ / [a  X{\Jr(  u)  < sa)  + a X{ \Jr(  u)  > s®]]dB(u)  , 
0 J p 


and  U(t);  t > 0}  as  the  process  \|r  appropriately  absorbed  or  reflected 

at  iero.  Impose  on  process  £ the  linear  holding  costs,  operational 

costs,  and  switching  costs  associated  with  process  X( • |x,  tt),  and 

absorbtion  upon  hitting  s^  or  sa  . at  costs  K.  ,,  j ..  + VJ  s- . f(  s^ . j)) 

P P+1  fv  P’  v P,J/' 

and  Ka,f(s^+1,a)  + VSp+l>  KSp+1, a)),  respectively.  Let  Vfij(y) 

denote  the  conditional  expectation  of  the  total  discounted  cost  generated 

by  5 in  this  setting,  and  we  find  that  V (y)  is  continuously 

aJ 

4 a 

differentiable  with  respect  to  y on  ( sj-,  Sp+^)  and  has  a second 

derivative  except  at  the  points  s^.  sa  and  sa  , . Therefore  since 

P’  P P+1 

t 3 

Vaj  Sp)  = Vf(x>a),  we  have  that  V^(x,a)  exists,  though  not  necessarily 
so  V'j(x,a).  Similarly  if  x = s®+^,  and  we  have  shown  that  V£(x,a) 
exists  for  all  x £ S and  V^(x,a)  exists  for  all  x ttiat  are  not 
discontinuity  points  of  f(x,a).  Hence  Vf  satisfies  (32). 

Finally,  in  the  case  of  absorbtion  property  ( I9)  of  a stationary 
policy  implies  that  V^(0,a)  = R,  thereby  validating  (36)  for  X = 0. 

In  the  case  of  reflection  we  have  two  possibilities  to  consider,  that  of 
state  zero  included  in  set  la  and  that  of  zero  lying  outside  of  I*. 

If  0 € Ia,  then  V£(0,a)  =0  by  application  of  (34)  to  the  interval 
between  s®  and  s®.  If  0((  I®,  then  y € if(°>a)  for  all  y in 
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some  interval  [0,  and  thus,  V£(0,a)  = V^(0,  f(0,a))  = 0 


where  the  second  equality  follows  by  the  argument  of  the  previous  sentence. 
So  (36)  holds  true  also  for  X = 1,  and  we  conclude  that  is  indeed 

a solution  to  conditions  (32)  through  (36). 

Suppose  now  that  function  V : SxA  also  satisfies  (32)  - (36). 

Then  letting  A(x,a)  = V(x,a)  - V^(x,a)  for  all  (x,a)  in  SxA,  we 
would  find  that  the  function  A satisfies  these  conditions  for  all  a £ A: 


(33’) 

N*,*)  -^1  , 

is  bounded 

in  S , 

(34’) 

Da  A(x,a)  - 0A(x,a)  = 0 , 

for 

all  x 

si* , 

(35’) 

A(x,a)  = A(x,  f(x,a))  , 

for 

all  x 

t ia , 

and 

(36*) 

>A'(0,a)  - (1-*)  A(o,a)  = 0 . 

A solution  to  the  second  order  differential  equation  (34*)  would  imply 
tna'. 


P1X  P2X  a 

A(*,a)  = e + r2  e , for  all  x € I , 

where  p^  is  the  positive  root  and  p^  the  negative  root  to  the  quadratic 

12  2 

equation  u^P  + 2 °a  ^ -0=0.  But  in  order  to  maintain  the  required 

bound  (33')  and  boundary  conditions  (35')  and  (36'),  it  must  be  that 
= r~2  = Hence  we  have  that  A(x,a)  = 0 for  all  (x,a)  £ SxA,  and 
is  the  unique  solution  to  conditions  (32)  through  (36).  □ 
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; 


I 


r 


To  emphasize  the  special  nature  of  stationary  polic  ies  within  the 


class  of  all  admissible  strategies  we  call  as  characterized  by 

Theorem  4 above,  the  return  function  for  stationary  policy  f.  Thus 
stationary  policy  f is  ( x,a)-optimal  if  Vf(x,a)  = V*(x,a).  Finally 
we  conclude  this  chapter  with  the  concept  of  everywhere  optionality  by 
further  calling  policy  f optimal  if  it  is  (x,a)-optimal  for  all 
( x,a)  £ SxA. 
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CHAPTER  4 


OPTIMAL  STATIONARY  POLICIES 


In  this  chapter  we  derive  a necessary  and  sufficient  condition  for 
a given  stationary  policy  to  be  optimal  for  the  control  problem  formulated 
in  Chapter  3.  It  will  not  be  shown  that  there  always  exists  an  optimal 
stationary  policy.  We  conjecture,  however,  that  this  is  true,  and  in 
Chapter  5 we  will  explicitly  produce  a stationary  policy  that  is  optimal 
for  the  general  problem  when  there  are  two  available  control  modes  N = 2. 


4.1.  Optimality  of  Stationary  Policies 


We  will  first  prove  a preliminary  proposition.  The  main  theorem 


will  then  be  stated  and  proved. 


Proposition  1.  Suppose  that  V : SxA  -»IR  satisfies  the  following  for 
all  a € A: 


V(-,*)  t C„(S)  , 


|V(*,->  -£l  , 


is  bounded  for  all  x £ S 


V(x,a)  < K +■  V( x, j)  , for  all  x € S and  all  j £ A , 


Proof.  We  first  note  that  if  function  V : S>A  -»IR  satisfies  (3),  then 
for  each  a £ A,  V(x,j)  = V(x,a)  for  all  x £ S and  all  j £ 6(a). 
Therefore,  we  may  restate  conditions  (1)  through  (5)  as 


d') 

V(-,k)  € cf(S)  , 

k £ {1,  2,  ... 

Mi  , 

(2') 

Vix 

|V( • , k)  - — | , is  bounded 

for  all  x £ S, 

k £ {1,2, , 

(3 ') 

V(x,k)  < Ck&  + V(x,£),  for  all 

x £ S and  all 

M £ { 1,2, . . . ,M}, 

(h') 

Dj  V(x,k)  - QV(x,k)  + g(x,j)  > 0, 

for  all  x € S 

and  all 

j £ Ak,  k £ {1,  2,  M}  , 

(5')  XV'(0,k)  - (i-A)  V(0,k)  + (l-A)R  = 0 , k £ {1,  2,  M}  , 
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\ 


where  for  each  (x,k)  € Sx{l,  2,  . ..,  M),  we  define  V(x,k)  as  the 
common  value  of  V(x?j)  for  all  j £ A^. 

Fix  (x;a)  £ SxA  and  let  tt  be  an  arbitrary  admissible  strategy. 


to,-)  ->{1,  2,  ...,  M} 

defined  by 

| 0(a)  , 

if  t = 0 

0(0  = 

1 0( tt(  t) ) , 

if  t > 0 

has  finitely  many  discontinuities  in  each  finite  interval  of  time.  So 

letting  T,,  T_ T , T 1t  ...  denote  the  discontinuity  points  of 

l ’ d’  7 n7  n+i7 

0,  we  have  that  0 = T„<  T.  < T_  < •••  < T <T  . < •••  and  T -» +«. 
’ 0—12  n n+1  n 

almost  surely. 

Now  by  inspection  of  definition  ( 15)  in  Chapter  3 we  see  that 

T i 

[°°  r n+i  _nt 

Z / e g(X(t|xJ  tt),  tt(  t) )dt 
n=0  Lt 


-a r , 

. 0+1  c. 


,»]]  ' 


Application  of  condition  (k')  leads  us  to 

T 


(7)  v (X,a)  > Ef  z [/ 

Ln=0  Lt 


n+1 


e‘at[QV(x(  t|x,  tt),  0(0) 


- \{t)  V(X(t|x,  tt),  0(  t))]dt 

l-hkvi*)]] 


-or  , 

n+1  r 
+ e C 


e(T 


51 


4 


since  0(?r(t))  is  constant  on  (T  T ) for  every  n = 0 IP 

n'  n+  I y y } ••• 

Fix  n £ {0,  1,  2f  For  all  t £ [T  T 1 we  have  that 

n n+i 

X(t|x,  tt)  = X(Tjx,  v)  + f M?r(u)  du  + / dB(u)  + Y(t) 

n Tn 

where  X(-|x,  v)  and  Y uniquely  satisfy  Theorem  1 in  Chapter  3. 
Define  function  Fr  : (Tr,  T^)  x S ->1K  by 


Fn^c^x)  ~ e V(x,  0(t))  for  all  ( t x)  € (T  , T ,)  x S 

’ n>  n+1'  * 


By  condition  (!•),  V(.,  e(t))  £ Cf(S)  for  all  t £ (T  ,T  ,)•  Thus 

n7  n+ 1' 

Fn  satisfies  the  conditions  of  Theorem  2 in  Chapter  2 and  the  result 
here  is  as  follows: 


(8)  V(X(c|x,  ir),  S(  t)> 


V(X(Tjx,  n),  S(T*)) 


-0!u  - 


+ ; l^-ae  “ V(X(u|x,  TT),  e(  u))  + i e"au  V"(X(u|x,  7 r),  0(u)) 


+ e'au  V’(X(u|x,  tt),  e(u))  U?r(u)]  du 


+ / e"au  V'(X(u|x,  TT),  e(u))  dB(u) 


+ / e'au  A/*(X(u|x,  t),  0(U))  dY(u)  , for  all  t £ (T  T ,) 
T n'  n+1' 


Equivalently, 


(9)  e’at  V(X(t|x,  t r),  0(t)) 


-or 

= e n V(X(Tn|x,  7T),  0(Tn+)) 


+ / V(X(u|x,  t r),  5(u))  - OV(X(u|x,7r),  0(u))j  du 

n 

+ / e'au  V’(X(u|x,  tt),  0(u))ct^uj  dB( u) 
n 


e"au  V '(X(  u |x, 


tt),  0(u))  dY(  u) 


for  all  t € (T  . T , 
v n>  n+ 1 


If  X = 0,  then  the  process  Y is  every  where  zero,  and  thus  the 
last  integral  on  the  right  hand  side  of  (9)  equals  zero.  If  X = 1, 
then  the  non-negative  process  Y grows  only  where  X(  • |x,  tt)  is  zero. 
In  this  case  the  same  integral  becomes 


t 

/ 

T 

n 


e‘au  V'(0,  0(u))  X{X(  u|x,  tt)  = 0}  dY(u) 


and  likewise  disappears 

for  all  u € (T  , T ,) 
n*  n+1' 

Conditions  (1') 

boundedness  of  a , . 

tt(-) 


, since  \ = 1 implies  that  V'(0,  g(  u) ) =0 
by  virtue  of  condition  (5’). 
and  (2'),  the  continuity  of  X(  • |x,  tt),  and  the 
guarantee  that 


/ e"20111  e[[V’(X(u|x,  r),  0( u))  o^]2]  du  < - on  (Tn,  Tn+1)  , 


53 
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Finally f condition  (31)  implies  that 


VdKTjx,  T),  §(V))  < Cj  . ♦ »(KTjx,  n 5(V» 

n ' n 


for  each  n 


so  that  we  conclude  that 


[oo  -OT  |- 

V(x,  9(a))  + Z e n V(X(Tjx,  tt),  0(T+)) 
n=  1 L 

- [cS(Tn-),S(V)  + ’WJ*.  5<V»] 

+ C0(Tn-)-8(V)]] 


= V(x,  6(a))  . 

And  since  V(x,  0(a))  = V(x)a)>  we  have  as  desired  that  V^fx^a)  > V(x;a) 
and  V#(x,a)  > V(x,a) . rj 

Theorem  1.  Let  f be  a fixed  stationary  policy  and  denote  by  V^(x,a) 
its  return  function  on  SxA.  Then  f is  optimal  if  and  only  if  the 
following  three  conditions  are  satisfied  for  each  a € A: 


U3) 


vf(x,a)  = ®in  (K  , V (x,  j))  for  all  x € S 

j C A aj  1 
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(14)  min  {D  V (x,j)  - CW  (x,j)  + g(x,j)}  = 0 for  all  x £ S\{0} 

j £ L(x,a)  J 

where  L(x,a)  = {j  € A : Vf(x,a)  = + Vf(x,j)}  and  is 

as  before,  and 


(15)  AV^(0,a)  - (1-4)  Vf(0, a)  + (l-A)R  = 0 . 


p 

Proof.  From  Theorem  4 in  Chapter  5 we  have  that  V^(*,a)  € C*(S)  for 
each  a £ A and  that  |v^(x,a)  - hx/o:|  is  bounded  in  Sx&.  Since 
0(a)  c L(x,a)  for  all  (x,a)  £ SxA,  condition  (14)  here  implies 
condition  (4)  in  Proposition  1.  Hence  all  of  the  conditions  in 
Proposition  1 are  satisfied,  which  immediately  results  in  V^(x,a)  = V^(x,a) 
everywhere  on  SxA. 

Suppose  now  that  stationary  policy  f is  optimal.  Condition  (15) 
is  exactly  the  boundary  condition  satisfied  by  all  stationary  policies, 
so  it  holds  for  an  optimal  policy. 

If  (15)  fails,  then  there  exists  x £ S and  a and  j in  A 
such  that  V^(x, a)  > K + V^(x,  j).  Then  since  V^(*,a)  is  continuous 
on  S,  there  exists  an  e > 0 such  that  the  following  hold 

(i6)  Vf(y,a)  > KflJ  + Vf(y,j)  , for  all  y £ (x-e,  x+e]  , 

and 

< 17>  [K»j  * $ [l  ' 'I''”]]  - £ E[Te-0^] 
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• . 

where  { \Ir(  t ) ^ t > 0}  is  the  process,  \|r(t)  = x + + ct^  B(  t)  and  T 

is  the  stopping  time  T = inf{t  > 0 : \|r(  t)  £ [x-e,  x+e]).  Now  define 
the  admissible  strategy  -tt  by 


(18) 

■"■(t)  = j , 

for 

t € (0,T] 

(19) 

6(n(t+))  = 6(f(X(t|x,  tt),  tt(  t) ) , 

for 

t > T , 

(20) 

TT(t+)  = f(X(  t |x,  tt),  tt(  t) ) , 

for 

t > T except  for  X(t|x,  tt) 

a boundary  point  of  I7r^t^ 


where  X( • |x,  tt)  is  the  control  process  generated  by  tt  and  x.  That 
is,  strategy  tt  uses  action  j until  time  T and  then  follows  policy 
f every  afterward.  Hence 

V*,a)  = Kaj  + e[/  e'at  g(X(t|x,  tt),  7r(t))dt 

+ ^ KJ,f(X( T(xfT),J)  + ^ V£(X(T|x,T),f(X(T|x,T),j))] 
e"at[hvKt)  + r j ]dt j + E[e"OT  Vf(i|r(T),  j)  ] , 
and  using  (16)  and  (ly)  we  have  that 

V*,«)  < KaJ  + E[/  e_at[ht(t)  + rj]dt]  + E^e  [Vf(*(T),a)-Kaj]j  < V£(x,a). 
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This,  however,  contradicts  the  optimality  of  f proving  that  ( 13) 


must  hold. 

If  (14)  fails,  then  there  exists  x € S\{0)  and  a and  j in 

A such  that  Vf(x,a)  = + Vf(x,j)  and  Vf(x,j)  - QVf(x,j)  + g(x,j)  < 0. 

2 

Since  Vf(*,j)  £ C#(S),  there  exists  an  e >0  such  that 

Dj  Vf(y,j)  - QVf(y,j)  + g(y,j)  < 0,  for  all  y £ [x-e,  x+€], 

and  we  define  process  stopping  time  T,  and  admissible  strategy  tt 
exactly  as  in  the  previous  paragraph.  Now, 

T 

VX’a)  = Kaj  + E[^  e'at  j)d*]  + Ete"0^  Vf(*(T),J)] 

T 

< Kaj  + E^/  e"at[Q!Vf(iKt),J)  - Dj  Vf(  *(  t) , j)  ]dt  j 
+ E[e_QT  Vf(Mr(T),j)]  . 

But  as  we  saw  in  the  proof  of  Proposition  1,  we  can  apply  our  extension 
of  Ito's  lemma  (Theorem  2 in  Chapter  2)  to  get  that 

T 

E[e_OT  V£(*(T),J)]  = Vf( x,  j)  + E^/  e-0*^  Vf(»(t),  j)  - Wf( ♦(  t),  j)  ]dtj  . 

Hence 

Vx>*>  < K«j  * - Vx>*>  - 
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and  we  have  again  contradicted  the  optimality  of  thereby  concluding 
that  ( 14)  is  also  necessary  for  stationary  policy  f to  be  optimal.  □ 

4.2.  An  Interpretation  of  the  Optimality  Conditions 

We  would  now  like  to  interpret  the  optimality  conditions  of 
Theorem  1 and  the  proof  given  there.  Let  : SxA  be  the  return 

function  corresponding  to  an  optimal  stationary  policy  f.  Assume  an 
initial  starting  state  x and  an  initial  control  mode  a € A. 

Suppose  that  yowf  as  the  controller  switch  immediately  to  mode 
j at  time  zero  and  continue  to  use  j over  the  interval  [0,t]  and 
thereafter  follow  policy  f.  Letting  U .(x  t)  denote  your  expected 
total  discounted  cost  we  have 


(21)  Uaj(x,t)  = Kaj  + e[;  e’^thXjfu)  + r.]du  + e'at  Vf(Xj(t),j)J  , 

where  {Xj(t),‘  t > 0}  is  the  Brownian  Motion  process  starting  in  state 

x with  drift  parameter  and  variance  parameter  o properly  reflected 

or  absorbed  at  zero.  Using  Theorem  2 in  Chapter  2 (Ito's  lemma)  to 

evaluate  E[V  (X  ( t)  j) ] in  (21)  above,  we  can  approximate  U .(x  t) 
t j aj 

by 


Kaj  + tet  + rjt  + (l-°*)  tVf(x,j)  + tDjVf(x/ j)  ] + o(t)  , 


since 
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t 


and 


e|7  e'^fhX^u)  + r^lduj  = hxt  + rjC  + o(t) 


E[e"at  Vf(Xj(t)’  3)]  = [1  " at  + °(t)1  £Vf(x,J)  + tDjVf(x,j)  + o(t)] 


Therefore, 

Uaj(x,t)  = fKaj  + Vf(x,  j)  ] + [Dj  Vf(x,  j)  - OVf(x,j)  + hx  + rj]t  + o(t), 
and  for  V^(x,a)  to  be  the  optimal  return  it  must  be  that 


V (x,a)  = min  U .(x,t) 
j € A J 

for  all  sufficiently  small  t. 

Hence  we  can  summarize  our  optimality  conditions  ( 13)  and  ( 14) 
of  Theorem  1,  by  demanding  that  V^(x,a)  satisfy  the  single  condition 

(22)  min  {[K  + Vf(x,j)  - V (x,a)l 

j £ A a] 

+ tfDJ  Vf(x,j)  - QVf(x,j)  + g(x,j)]  f = 0 , 

for  all  small  enough  t . 

We  call  (22)  a lexicographic  minimum  condition  since  it  requires  first  that 


(23) 


Vf(x,a) 


min  {K  + V (x,j)J  , 
J € A aJ  E 
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and  then  that 


(24)  min  {D  Vf(x,j)  - QVf(x,j)  + g(x,j)}  = 0 

j £ L(x,a)  J 

where  L(x,a)  is  the  set  of  j £ A that  achieve  the  minimum  in  (23). 
Equation  (22)  is  the  Bellman  equation  of  dynamic  programming,  specialized 
to  our  control  problem. 
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CHAPTER  5 


SOME  EXPLICIT  SOLUTIONS 

In  this  chapter  we  present  some  explicit  solutions  to  the  optimal 

control  problems  formulated  in  Chapter  3.  That  is,  using  the  necessary 

and  sufficient  optimality  conditions  developed  in  Chapter  4 we  construct 

stationary  policies  that  are  optimal.  We  deal  with  the  system  that  has 

two  available  control  modes,  i.e.,  N = 2 and  A = (1,2}.  The  two  control 

2 

modes  are  characterized  by  the  drift  and  variance  parameter  pairs 

2 

and  (iig,  cr^),  and  we  have  labelled  the  modes  1 and  2 so  that  > p^. 

In  the  first  section  we  deal  with  an  absorbing  barrier  at  the 
boundary  and  the  particular  cost  structure  of  linear  holding  cost  rate 
h = 0,  operational  cost  rates  r^  = r^  = 0,  boundary  cost  R ^ 0,  and 
switching  costs  = 0.  We  call  this  form  of  our  control  problem 

a death  penalty  problem,  and  we  show  the  simplest  of  stationary  policies, 
one  that  forever  uses  the  same  controlmode  regardless  of  initial  state 

and  initial  mode,  to  be  optimal.  When  R is  positive,  it  will  be  optimal 

2 2 

to  always  use  mode  1 if  and  only  if  p^,  cr^,  \i^f  Og  and  interest  rate  O. 
satisfy  specified  relationships.  However,  when  R is  negative  those 
same  parameter  combinations  lead  to  the  optimality  of  the  policy  that 
always  uses  mode  2. 

Section  5- 2 treats  the  absorbing  barrier  problem  with  zero  switching 
costs  and  general  cost  parameters  h,  r^,  r^  and  R.  We  iimnedlately  prove 
this  optimal  control  problem  to  be  equivalent  to  one  in  which  there  is 
absorbtion,  no  switching  costs,  no  linear  holding  costs,  one  zero  operational 


r 


\ 


cost,  and  one  non-negative  operational  cost.  We  then  determine  an  optimal 
stationary  policy  that  selects  actions  as  a function  of  state  only  and 
that  is  characterized  by  a single  critical  number  z £ S.  One  control 
mode  is  used  whenever  the  state  of  the  system  is  above  level  z,  and  the 
other  mode  is  used  when  the  state  is  below  the  critical  level.  For 
certain  realizations  of  the  cost  and  diffusion  parameters,  z will  be  zero 
and  our  optimal  policy  will  be  one  that  simply  uses  the  same  control  mode 
everywhere  on  S.  For  other  parameter  situations,  the  critical  number 
will  be  positive  and  explicitly  stated  as  the  unique  solution  to  a compli- 
cated transcendental  equation. 

In  Section  5-3  we  present  an  optimal  stationary  policy  and  optimal 
return  function  for  the  control  problem  with  no  switching  costs  and  reflec- 
tion at  the  boundary.  With  a reflecting  barrier  the  controller  must  be 
concerned  with  controlling  linear  holding  costs,  and  to  avoid  the  com- 
putational complexities  seen  in  Section  5.2,  we  will  assume  zero  opera- 

2 2 

tional  costs.  We  will  show  that  if  > Og,  then  an  optimal  policy  is 

2 2 

to  always  use  mode  2.  If  < Og,  we  have  what  we  call  a tortoise- 
hare  problem.  (Mode  1 is  the  "tortoise"  and  mode  2 is  the  "hare".)  In 
this  case  our  optimal  policy  is  also  a single  critical  number  policy, 
and  again  the  critical  number  z is  the  unique  positive  solution  to  a 
transcendental  equation. 

In  Section  ^.b  we  add  to  the  tortoise-hare  problem  of  Section  5.3 
a positive  symmetric  switching  cost  K = = Kg^  > 0.  With  such 

switching  costs  we  show  that  there  exists  an  optimal  stationary  policy 
that  is  a function  of  both  current  state  and  current  mode.  This  optimal 
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policy  involves  two  critical  numbers  z and  Z,  where  0 < z < Z < ». 
Control  mode  2 is  U9ed  whenever  the  state  of  the  system  is  above  Z and 
mode  1 is  used  whenever  the  state  falls  short  of  z.  When  the  state  is 


inbetween  the  critical  numbers,  the  controller  maintains  the  control  mode 
currently  in  use.  As  in  the  case  of  zero  switching  costs,  the  critical 
numbers  are  characterized  by  complicated  formulas. 


5.1.  A Death  Penalty  Problem 

Suppose  that  we  have  a two  mode  system,  absorbtion  at  zero,  and 
only  a non- zero  cost  R at  the  boundary.  That  is,  the  linear  holding 
cost  rate  h,  operational  cost  rates  r^  and  r2,  and  switching  costs 
K12  and  K21  are  a^*  zero*  Let  TT  be  any  admissible  strategy  and 
given  initial  state  x £ S,  define  the  random  variables 

T (y)  = inf{t  >0  : X(t(x,  ir)  < x-y},  for  all  y € [0,x]  . 

Corresponding  to  v then  is  the  value  function 


r x)  i 

(1) 

V*,a)-K.(e  ] 

9 

for  all  (x,a)  € SxA  . 

| 

Let  us  first  consider  the  stationary  policy  f^  of  always  using 

control  mode  1.  Since  the  switching  costs  are  zero  here,  the  return 

function  associated  with  fp  is  such  that  V (x,l)  = Vf  (x,2)  for 

*1  1 

all  x € S.  Therefore  we  can  suppress  the  second  argument  in  Vf  and 

1 

represent  the  return  function  as  a function  V,  : S-®  of  initial 

*1 

state  only. 


6b 


For  (x,a)  £ SxA  let  7r(x,a)  be  the  unique  admissible  strategy 
corresponding  to  x and  a.  Using  the  strong  Markov  property  of 

X(  • | x,  -rr(x,a))  we  then  have  that 

E|exp(-a(Tx(y)  + Tx(x)  - Tx(y)))j  Tx(y)  < » j 

= E^exp(-QTx(y));  Tx(y)  < » J E^exp( -a(Tx( x)  - Tx(y)));  Tx(y)  < ooj  , 

for  all  y € [0,x]  . 


Now  since  X(*|x,  -rr(x,a))  has  stationary  and  independent  Increments, 
the  random  variable  [Tx(x)  - T (y) ] has  the  same  distribution  as  the 
random  variable  T (y).  Therefore 


E[exp(-QTx(x))  ] = E[exp(  -off  (y))  1 E[exp(-OTI  (y))1  , for  all  y £ [0,x], 


Thus  we  have  the  exponential  function 


(2)  Vf  (x)  = RE[exp(-QTx(x))1  = Re‘Px  , for  all  x£S, 


where  (3  Is  a real-valued  function  of  the  parameters  and  Is  non-negative 
to  insure  that  the  expectation  in  (2)  decreases  as  x increases  on  S. 
Substituting  (2)  into  Theorem  4 of  Chapter  3 we  see  that 


Re~^X[-U]0  + \ o\  p2  - a]  = 0 , 


for  all  x £ S 


and  hence  f}  must  solve  the  quadratic  equation 


12  2 

a + - *2  aj_  r = 0 
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Since  we  desire  0 to  be  non-negative,  it  must  be  that 


^ + 2a  a2 


and  we  have  completely  determined  the  return  function  for  policy  f^. 

Proceeding  to  evaluate  the  necessary  and  sufficient  optimality 
conditions,  we  have 


and 


D V (x)  - OV  (x) 
1 *1 


D V (x)  - OV  (x) 
* rl  rl 


R | cr^  p2  - a]  = 0 , for  all  x € S, 


R e"PX[-u2P  + | a2  32  - al  , 


for  all  x C S. 


and  Vf  (0)  = R.  Therefore  V will  be  the  optimal  return  function 
*1  fl 

and  f^  an  optimal  stationary  policy  if  and  only  if 


(5) 


R[|  4 (32  - n2p  - a]  > o 


Condition  (^)  can  be  shown  to  be  equivalent  to 


( U)  + if^i  + ” u2°l^  + ®°i(  °2  ” al^|  — ® > 

and  further  Inspection  leads  us  to  conclude  that  under  any  of  the  following 
combinations  of  diffusion  parameters  and  interest  rate  it  will  always  be 
optimal  to  use  control  mode  1: 
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[1] 


[2] 


[31 


R > 0 

2 2 
°!  < ct2  , 


R > 0 

2 2 
°1  > °2 

41  - ^2  > 0 
2 

_ fi 

"*  4 


a < 


2 2 

2( ^ la2  ■ ^2°l)  (^r^2) 

7~2  272 

(a.  - o ) 


R > 0 

2 2 
al  > a2 

> 0 > M2 
Mi  t M2 


a < 


2 2 

2(mi^2  " ^2°1^  [^1*^2^ 
P?  272 

( °1  - °2> 


[4] 


R > 0 

2 2 
CTl>a2 

0 > ,al  - ^2 


_2(V2‘  ^2CT1>  ^1-^2) 
- . 2 2.2 
(J1  “ a2> 
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R < 0 

2 2 
°1  > a2 

> 0 


2 2 

m1°2  ' ^2°  l - 0 > 


R < 0 

2 2 

°l>02 

M-  ]_  > Hg  > 0 
2 

u„  2 


2 2 

2(kkj^g  ” (n^"Mg) 

T~2  272 

(°i  " ap) 


R < 0 

2 2 
al  > a2 

Hj  > 0 > |ig 


U1  ^ ^ 


2 2 

2(^l°g  - UgOj_)  (Pj-Hg) 
(CTt  - °g) 


R < 0 

2 2 
al>02 

O > ^.  > 


2 2 

2(n1ag  - pigff^  (lij-Hg) 

. 2 2.2 

(al“°2) 


Next  consider  the  policy  of  always  using  control  mode  2. 

m 

of  state  only  and  in  the  following 


Similar  analysis  results  in  a return  function  V,  that  is  a function 

f2 


V (x)  = Re_PX  , 


for  all  x € S 


where 


P = 


i2  + 


D1  Vf2(x)  - aVf2(x)  = | 4>2  - al  , 


for  all  x € S 


D2  Vf^(x)  “ ~ Re  PX["P2e  + ^ a2p2  " al  = for  all  x € S f 


and  V (0)  = R.  Therefore  V is  the  optimal  return  function  if  and 

2 t2 

only  if 

(5)  + V^2  + (p2CTl  " pla2^  + acI2^  crl"a2^|  - 0 * 


We  thus  find  it  optimal  to  use  mode  2 always  whenever  one  of  the  following 
combinations  arises: 


[9] 


R < 0 

2 2 
CT1  ^ a2 
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R > 0 


2 2 
°l>a2 

Hi  > t-*o  > 0 


— >-= 


a > 


2 2 

2(h2<j1  - n^)  (Hg"^) 

~~2  272 

(°o  ~ 


R > 0 


> 0 > h2 
**1  ^ »2 

„ . ^'Vl  - ^la2>  ^2’»*l) 

r-\  r\ 


[16] 


R > 0 

2 2 
al  >02 

0 > > ^2 


a > 


2 2 

. 2 2.2 

(°2’°1> 


The  combinations  [l]  through  [16]  exhaust  all  possible  values 
for  the  diffusion  parameters  and  the  positive  interest  rate.  Hence  we 
have  completely  solved  the  death  penalty  problem  and  have  shown  that  an 
optimal  single  band  policy  always  exists.  That  is,  it  will  be  optimal 
to  either  always  use  control  mode  1 regardless  of  initial  state  and 
initial  mode,  or  it  will  be  optimal  to  always  use  mode  2. 
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5.2.  Absorbtion  and  No  Switching  Costa 

We  treat  here  the  general  two  control  absorbing  barrier  problem 
with  linear  holding  cost  rate  h,  operational  cost  rates  r^  and  rg 
for  mode  1 and  mode  2,  respectively,  boundary  cost  R,  zero  switching 
costs  and  positive  interest  rate  OL.  For  a given  admissible  strategy 
n we  have  the  following  value  function 


e-Q!t[hX(  t|x,  it) 


V(  t) 


]dt  + Re 


(x,a)  € SxA  , 


where  T is  the  time  of  absorbtion  for  the  controlled  process  X(.|x,tt). 
The  next  proposition  will  allow  us  to  rid  our  cost  structure  of  all  linear 
holding  costs. 


Proposition  1.  Let  n be  an  arbitrary  admissible  strategy  and  let 
x e S.  Then 

e[/  e-at  X(t|x,  Tr)dtJ  - | + e[/  £ e'at  fc)  dt] 
where  T = inf{t  >0  : X(t|x,  tt)  =0}. 

Proof.  Fix  admissible  strategy  tt  and  state  x.  Let  n : [0,»)  -» 

{0,  [xv  n2,  nN)  and  a : [0,»)  ->  {0,  av  og,  aNJ  be  the  follow- 

ing functions 
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n(t)  = 


>v(t)  > 


t > T 


Vt)  ' 


t > T 


Now  define  the  following  Ito  process  (X(t);  t > 0}, 


t t 

X(t)  = x + / p(u)du  + / a(u)  dB(u)  f t > 0 f 
0 0 


and  note  that 


T 00 

e[/  e'at  X(t|x,  7r)dtj  = e[/  e’at  X(t)dtj  . 


Since  a is  bounded  on  [0,«)  we  have  that 


Hence 


E^/  ct(u)  dB(u)j  = 0 f for  all  t € [0,®)  . 

E[X( t) ] = x + e[;  n(u)duj  , for  all  t € [0,®)  , 


and  a simple  change  in  the  order  of  integration  leads  to  the  desired 
result.  □ 


So  returning  to  our  control  problem,  let 
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By  virtue  of  the  above  proposition,  we  then  have  that 


V >a)  = % ' 5 + Ek  e"at  K(t)dt  + R e'at] » <x>a) € SxA > 


for  any  admissible  strategy  y.  Hence  minimizing  V (x,a)  over  all 


-at  - ..  = -or-, 

t)dt  + Re  ]> 


admissible  strategies  is  equivalent  to  minimizing  E[f  e r , . 

0 7r' 

and  we  have  reduced  the  original  problem  to  one  where  there  are  no  linear 


holding  costs,  one  zero  operational  cost,  and  one  non-negative  operational 
cost. 

Let  us  look  first  at  the  case  r^  = 0.  Assume  also  that  r ^ > 0, 
since  otherwise  we  would  have  the  death  penalty  problem  which  has  already 
been  solved.  Since  the  switching  costs  are  zero,  the  return  function 
associated  with  any  stationary  policy  f is  such  that  V^(x,  1)  = V^(x,2) 
for  all  x € S.  We  therefore,  as  before,  represent  the  return  function 
Vf  as  a function  of  state  only. 

We  begin  with  the  single  band  policy  f^  of  always  using  mode  1 
and  recall  from  Section  5.1  that 


V£  ( x)  = Re‘Px  , 


for  all  x £ S 
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1 


Likewise , 

12  2 
u,  4-  t/u1  + 2aa. 

P-  2 • 

CT1 

the  single  band  policy  f of  always  using  control  mode  2 

leads  to 

V 

fa'*>  - r-i  * [R  - ■?]  e’ox . 

for  all  x £ S , 

where 

1 2 2 

U 2 + V^2  + 

o - 2 

°2 

We  then  have  the  following  relationships  on  S 

(6) 

D V (x)  - ON  (x)  = 0 . 

(7) 

1 1 

^2 

D^f^x)  - QVf  (x)  + r2  = r2  + 

^ - u2P  - a]  Re_t3x  , 

(8) 

V (0)  = R , 

rl 

(9) 

2 

D].Vf  ( x)  - QVf  (x)  = -r2  + ^ p2  - 

“i"  - a]  [R  - t]  '_ox  - 

(10) 

D V (x)  - ON.  (x)  + r = 0 , 

2 r2  d 

and 

(11) 

V (0)  = R . 

2 


L 
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Let  A^  and  A^  denote  the  following  two  functions  of  the  parameters 

2 2 

(^p  °1»  ^2»  > 


and 


°2  2 

Ai  - t p ^ * a > 


A 1 2 

Ag  — ^ P * 4 * 0*  • 


The  optimality  conditions  of  Theorem  1^  Chapter  4 will  then  be  satisfied 
for  policy  f^  if 


(12) 


r2  + Al  Re"^X  > 0 , 


for  all  x £ S 


and  will  be  satisfied  for  f^  if 


(13) 


■r2  + A 


e-°X  > 0 , 


for  all  x £ S 


We  have  seen  in  evaluation  of  (4)  that  A^R  >0  if  and  only  if 
one  of  the  situations  [11  through  [81  holds  true.  Therefore  the  policy 
of  always  using  mode  1 is  optimal  whenever  we  have  one  of  these  following 
cases: 


[171 


l r2  > 0,  R > 0 

{ 2 2 

l °1  < CT2  » 
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r 2 > 0,  R < 0 

A*  - r2 
2 2 
°1  > °2 

Hi  > H2  > 0 
2 

^ >I_1 

"2  °2 


a < — 


2(^la2  ' ^2al)  ^r^?) 
( al  - a2) 


r2  > 0,  R < 0 

"V  5 r2 
2 2 
CT1  > a2 

U,  > 0 > Ho 


2 2 

2^1°2  ” ^2°1^ 

— 2 272 

c CT1  - ao) 


r 2 > 0,  R < 0 

"V  < r2 
2 2 
°1  > a2 

0 > Hi  > Up 

„ „ S(iil4  - ^4) 

, 2 2.2 
(CT1  ' a2) 


1 


[291 


[ 30l 


[311 


[32] 


r2  > 0,  R < 0 

2 2 
°1  > a2 


> 


ll  - K2  - 


> 0 


2 2 

^1^2  ~ ^2^1  — ^ » 


r2  > 0,  R < 0 

2.2 
CT1  > a2 

> t^2  > ° 

2 

Hi  >^1 


a > 


2 2 
2( M- 1°2  " la2<Tl) 

— 2 2^ 

(<*!  - a2) 


r2  > 0,  R < 0 

2 2 
°1  > °2 

UX  > 0 > n2 

/ n2 


a > 


2(m1°2  - ^2°\)  (^i'm2^ 


, 2 2.2 

(«!_  - oz) 


r0  >0,  R < 0 

2 2 
al  > a2 

0 > > n2 


a > 


2 2 

2( M ia2  “ ^2ai^ 

~P.  2^2 

(01  “ °2> 
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Inspection  of  ( 13)  shows  that  for  r0  < 0 and  every  set  of  diffusion 
parameters 


iim  {-r2  + a [r  - -/]  e-°X}  <0  . 

X t oo  ' t J 

Hence  the  policy  of  always  using  mode  ?.  can  not  be  optimal.  We  have  then 
thoroughly  investigated  the  optimality  of  the  single  band  policies  f^ 
and  f^.  However,  the  above  parameter  combinations  [17]  through  [32] 
do  not  totally  exhaust  the  possible  range  of  parameters. 

So  let  us  next  look  at  the  following  two  band  stationary  policy  f 


fz(x,l)  = fz(x,2)  = 


2 , 
1 , 


if  x e [o, z) 
if  x £ fz,o°)  . 


This  control  policy  selects  actions  according  to  a function  of  current 

state  only  and  is  characterized  by  the  single  critical  number  z £ S. 

Whenever  the  state  of  the  system  is  z or  greater  control  mode  1 is  to 

be  used,  and  whenever  the  state  of  the  system  is  below  level  z policy 

f selects  mode  2. 
z 

Again  let  7r(x,a)  denote  the  admissible  strategy  corresponding 
to  policy  f initial  state  x and  initial  control  mode  a,  and  let 
Tx(y)  = inf { t > 0 : X(t|x,  Tr(x,a))  = y)  for  all  y £ S.  We  then  see 
that  the  following  hold  for  the  return  function  V ^ 

z 
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for  x £ [z,o»)  . 


From  (15)  we  get 

(16)  V£(x).V  e-W”-2)  , 


for  x > z 


-OCT  ( z) 

since  as  before  we  can  replace  E[e  * ] by  the  exponential  function 

exp(  -(3(  x-z) ) on  [z,«>). 

We  now  wish  to  evaluate  the  two  expectations  in  (14).  Define  9 
and  i|i  to  be  the  following  functions  on  [0,z] 


cp(  X)  = < E I e 


r -<»(*)  T 

l*  ;y«)6Tx(0)J, 


and 


*(*)  » < E 


r •Qfrx{°)  i 

[*  * Tx(°)  <Tx(z)J  > 


if  x = 0 


if  0 < x < z 


if  x = z 


if  x = 0 


if  0 < x < z 


if  x = z . 
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For  x £ [0,  z]  let  (X(t);  t > 0}  be  the  Brownian  Motion  starting  in 
state  x with  drift  parameter  Ug,  variance  parameter  cfg  and  absorbtion 
at  levels  zero  and  z.  Associating  first  with  process  X zero  holding 
costs,  zero  operational  costs,  zero  switching  costs,  and  absorbtion 
costs  of  zero  and  one  at  the  origin  and  level  z,  respectively,  we  can 
see  that  cp(x)  represents  the  expected  returns  in  this  setting.  If 
instead  the  absorbtion  costs  are  one  at  the  origin  and  zero  at  level  z, 
then  >>(x)  represents  the  expected  total  discounted  costs.  Theorem  4 
of  Chapter  3 characterizes  the  return  functions  associated  with  stationary 
policies  and  so  leads  us  to  the  condition 


Dgcp(x)  - Qtp(x)  = Dg\|f(x)  - Oftlr(x)  = 0 , for  all  x £ [0,  z] 


Returning  to  calculation  of  (20),  let  y be  the  function  on  S 
defined  as 


r(x)  = E[e_QT]  , 


for  all  x € S 


where  T is  the  first  hitting  time  of  zero  by  the  unrestricted  Brownian 
Motion  starting  in  state  x with  drift  parameter  Ug  and  variance 
parameter  -Og.  We  have  shown  before  that 


y(x)  = e" 


for  all  x £ S 


and  we  note  that 
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(18) 


r(x)  = >Kx)  + <p(  x)  r(z) 


for  all  x G [0,z]  . 


For  x € (-cojz]  let  (Z(t)j  t > 0}  be  the  Brownian  Motion  starting  in 
state  x with  drift  parameter  variance  parameter  and  absorbtion 

at  level  z (but  no  absorbtion  at  zero).  Then  the  function  | defined 
on  (-oo^z]  by 


!(x)  = Eta-®1*]  , 

where  T*  = inf(t  >0  : Z(  t)  = z)  must  be  an  increasing  exponential 
function  in  x.  That  is? 

5(x)  = e_Tl(  x*z)  ^ for  x £ (-OO^z]  t 

2 

where  rj  is  a real-valued  function  of  the  parameters  (ug,  °2>  a)  and 
where  q is  non- positive.  Assessing  only  a boundary  cost  of  one  to  the 
process  Z at  the  level  zy  we  apply  Theorem  4,  Chapter  } t0  get 

( x)  - ctl(x)  = 0 , for  all  x G (-»,  z]  . 

Therefore  we  want  q to  be  the  non-positive  solution  to  the  quadratic 
equation 


1 _2  2 
2 


a + tA2Tl  ‘ "i  a2  Tl~  = 0 


which  means  that 
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12  2" 

\fu2  + 2a  a2 


Since 


5(x)  = cp(x)  + *(x)  |(0)  , 


for  all  x € [0,z]  } 


we  have  in  (18)  and  ( I9)  two  equations  in  the  two  unknowns  cp(x)  and 
\|r(  x) . The  solutions  are  as  follows 


e-nx  - 

e"°X 

e'712  - 

e"pz  ’ 

for  x 

e"°X  - 

I" e"^X  - e”pX"| 

0“P^  f_r  v 

le_T>Z  . e_pZJ 

e 9 tor  x 

We  can  now  write  ( 14)  as 

(=0)  ',(■)  -T*[Vf  <*>  -^][ 


.•y  ■ e~°x 

e-n2  . e-pz 


•qz-px  _ e-pz-nx 
e-qz  _ g-pz 


e~p  I 

e*pzJ 

-pz-qx-. 

J > 


for  all  x € [0,z]  f 


and  there  remains  only  the  unknown  value  Vf  ( z)  in  our  explicit 

z 

solution  of  the  return  function  associated  with  policy  f^.  Using 

condition  (J2)  of  Theorem  4 in  Chapter  3 we  can  characterize  ( z) 

z 

from  (16)  and  (20)  by 
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Thus  we  can  complete  the  explicit  solution  of  Vf  on  S as  follows 

z 

(23)  VfM  - e-°z)2  t(p-e)  e'PZ  - (,-0)  e'’>2]J  1 


• ((p-S)^02  - (,-P)e-12]  (1  - e-lt*-2).*-0*) 

- e-p2( e"r,2-e"pz)  ((p-p)*'02  - ( ,-P)e-''2)(  1 - .-»<*-*> -e’ 1*) 

* (p*-°2  - ne-32)  [oe"p2(  1-e"^2)  - t]e_I*2(  l-e'pz)  ]( e"3x-e"px)l 

* nj(e"l2  - p-p2)2  [(p-P)e'02  - (n-p)e'’'2]]'1 

.je-ll“)V'2  - e"pz) [(p-p)e-pz  . (,.p)e-32J(e-p(x-2).e-r'<x-2)) 

♦ (p-,)e-(^p)2(pp-p2  - ne-''2)(s-’l’<.a-p>t)|  ( 

for  0 < x < z f 
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and 


(2*0  V (x)  = [(e"1*2  - -pZ)  [ ( p-£)e"pZ  - ( n-Pje"1**]]’ * 

' | (pe-pZ  - t^2)  [pe”pZ(  l-e_T*Z)  - ^Z(  l-e"pz)  ]e"«  X_Z 
+ R[(e'nZ  - e'pZ)  [(p-3)e"pZ  - ( ^e-1*2]]’1 

* {(0-n)e-(Tl+p>Z(pe-pZ  - T1e-^)e-^X-2)}  , 

for  x > z . 

Let  us  turn  now  to  the  necessary  and  sufficient  optimality  conditions 
of  Theorem  Chapter  4.  We  immediately  verify  that 

D2  vf  (x)  - °tff  (x)  + r2  = fvf  (z)  - 77]  [d2  <P(x)  - «p(x)] 

Z Z L Z J 

+ [R  - [d2*(x)  - c»|f(x)]  = 0 , 

for  all  x € [0,  z]  y 

Dj  Vf  (x)  - OVf  (x)  = V (x)  [i  a2/  - pi  - a]  e'P(x_z)  = 0 , 

Z Z rz 

for  all  x £ [ zfa>)  f 

and  Vj  ( 0)  = R?  and  we  shall  judiciously  choose  our  critical  number  z 
z 

so  as  to  satisfy  the  remaining  optimality  conditions.  Those  two  remaining 
optimality  conditions  are  concerned  with  showing  the  non-negativity  of 
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1*1  Vf  (x)  - QV ^ (x)  on  [0,z]  and  the  non-negativity  of  (x) 

z z z 

- QV  + on  [ZjOo).  Letting  A denote  the  following  function 
z 

2 2 

of  the  parameters  °P  p2>  a2> 

2 

CT1  2 

a5  = — - a , 

we  see  that 


(25)  D V (x)  - QVf  (x) 
z z 

= -J[(e_TlZ  - e_pZ)2  [(p-fJ)e'pz  - ( 1 

• ( e-T*Z(  e”pZ  - e”^Z)  [(p-e)e-pZ  - ( r^e"^]  [A^e^^^e-P^a] 

+ e’pZ(e"pZ-e-T1Z)[(p-3)e'pZ-(T1-p)e-TlZ][A2e-p(x-Z)^5e-T'x+0!] 

+ ( pe”pZ- Tje” T1Z)  [ pe”pz(  1-e' nZ)  - T]e_ ^Z(  l-e'pz)  1 [A^e' 1 j 

+ R^(  e_r*Z  - e'pZ)2  [( p-f3)e”pZ  - ( n-P)e",,‘]]  1 

. | e‘(T>+p)2(e"TlZ-e"pZ)[(D-p)e'pZ-(T1-p)e"TlZl[A2e‘p(x"z)-A5e'Tl(x'z)] 

+ (p-n)e'(Tl+p)z(pe'pZ-,1e’Tlz)tA5e'TlX^2e_pX]|  , 
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(26)  D2  V - OV  + r2 
z z 

[(e',1I.e‘pl)t(p.p)e'pz  - ( j”1 

. |(pe“p*-ne‘,I‘)[pe-p*(l-e-Tl8)  - l-e“pz)  ] 

+ R^(e"Tlz-e-pz)[(p.p)e-oz).  ( J 1 

•{(o-t,)e-<T»tp)*(pe-pz-ne”>,)A1  + r2  , 

for  x > z 


Thus}  defining  the  function  H on  S as 


(27) 


H(x)  = d2  vf  (x)  ‘ D1  V£  (x)  + r2  , for  a11  x € S 


we  wish  to  find  a z € S such  that  H(x)  <0  for  x € [0, z),  H(  z)  = 0, 

and  H(x)  >0  for  x 6 (z^oo).  The  absence  of  switching  costs  in  the 

optimality  conditions  leads  us  to  require  that  V'j  (*-)  = V”  (z+),  and 

z z 


so  we  have  the  following  equation 


r r 

(28)  — P(d( p-P)e”pZ  - n(  T)-P)e"r‘Zl  + (R  - -g)  (p-q)0-p)(n-P)e"(Tl+p)z  = 0 . 


View  the  left  hand  side  of  (28)  as  a function  of  z € [0,00)^  and  let  F(  z) 
denote  its  value.  In  our  analysis  of  F( z)  we  may  restrict  attention  to 
the  situations 
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r2  > 0,  R > 0 


^1R  54  r2 


r2  > 0,  R < 0 


-V  > r2 


since  we  have  shown  policy  to  be  optimal  otherwise. 

Much  tedious  manipulation  of  (27)  and  (28)  under  situations  (29) 
and  (30)  implies  separating  the  remaining  parameter  combinations  into 
two  groups.  Group  I includes: 


r2  > 0,  R < 0 

■V  > r2 

M2  < 0 

2 2 

CT  l < °2 


^ 2(u2°l  - ^la2> 


r2  > 0,  R < 0 

"V  > r2 

n2  > 0 

2 2 

al  ^ °2  > 


V 


r2  > 0,  R > 0 


-V  > r2 


2 2 
al>CT2 


0 > > ^2 


2 2.  k 

-42ai(1-2a2)  < 


a > 


2 2 
2(41a2-n2a1) 


/ 2 2,2 


Group  II  includes: 


r2  > 0,  R < 0 


-V  > r2 


ti2  < 0 


2 2 
°1  — °2 


a < 


2 2 

2^2(|i20rllla2^ 


r2  > 0,  R < 0 


-V  > r2 


2 2 
°1  > a2 


> 0 > ^2 


^1  ^ w2 


-n2cri(  l-2a2)  < 


a < 


2 2 

^2^2ar4lCT2^ 
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r2  > 0,  R < 0 

1 R > r2 
2 2 
al  > a2 

^ > 0 > Hg 
^1  ^ ^2 

2, , _ 2.  . 4 

-V*2ai(  l~2a2>  >^1°2 

2 2 
2( I-1  ^CT2"t^2CT^) 

a — 2 2>2 


rg  > 0,  R < 0 

AR  > r2 
2 2 
CT1  > a2 

0 > - ^2 

-U2ai(  i-2^)  5 ^la2 

2n2(n2Vn1a|) 


r2  > 0,  R < 0 

^1R  > r2 
2 2 
CT1  >a2 


< 0 > H.  > \Xr 

k L c 


2 2«  4 

l-SOg)  >^^2 

__  2(u1a2-n2ari)  (Hj-Ug) 
Q!  < 2 2 2 

( Va2) 


r2  > 0,  R > 0 


^1R  > r2 
2 2 
CT1  > a2 

0 > - ^2 

2/i  , 2,  k 

-U2a1(  l-2a2'  >^l°2 
2(n  i<J2"^2<:Jl^  (^i“^2^ 

(<yo2> 


< a < 


2M2(U2VV1<J2) 


With  each  of  the  parameter  combinations  in  Group  1^  F(0)  >0  and 

lim  F(z)  < 0.  Hence  there  exists  a positive  solution  z to  the 
z t oo 

equation  F(  z)  = 0.  Call  the  first  such  root  z ^ that  is  z^  > 0 is 
such  that  F(z^)  = 0 and 


F(z)  > 0 , 


for  all  z e [OjZj^  . 


Let  G be  the  following  function  on  [0,oo) 


(31)  G(  z)  = (p-n)  F(z)  + F'(z) 

and  upon  substitution  of  (28)  into  (31),  we  find  that  G is  everywhere 
negative  for  the  parameter  combinations  in  Group  I.  Therefore^  F\  z)  < 0 
for  all  z £ [O^z^].  Now  suppose  that  there  exists  another  solution  to 
F(z)  = 0.  Call  the  next  such  root  z^,  that  is  z^  > z^  is  such  that 
F(  z^)  = 0 and 

F(z)  ^ 0 , for  all  z G ( z^  zg)  . 
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Thus  it  must  be  that 


F(z)  < 0 , 


for  all  z € (z,,z  ) , 


and  that  > 0.  This  however  contradicts  the  negativity  of 

G(z2)>  and  we  conclude  that  under  the  Group  I cases  there  uniquely 

exists  a positive  solution  z to  (25).  Similarly,  with  each  of  the 

parameter  combinations  in  Group  II  we  have  that  F(0)  < 0,  lim  F(  z)  >0 

z t ® 

and  G(z),  as  defined  by  (31),  is  everywhere  positive  on  [0,»).  So 
again  there  exists  a unique  z > 0 satisfying  (28). 

Given  our  parameter  restrictions  (29)  and  (30)  and  the  positive 
critical  number  z uniquely  defined  by  (28),  we  now  evaluate  the  function 
H for  policy  f^.  Expansion  of  (27)  becomes 


,,,,  H(x>  r2|-(p-n)  .-‘•‘nft 

<32)  "(,1)  - “ * eu-V  - J L 


e-1*  - A2  .-»* 

e-l*  - e'oz  J 


L .'I2  - J 

— + r2  , for  0 < x < z , 


•pe^Z  - qepZ  ♦ P(epZ  - e'*)- 


e-qx-pz  _ e-px-qz 
e" ^Z  - e-pZ 


r0  r -pz  - qz 

(55)  "(->  - -g  Kp.  ~ M ~ 

+ Rr i£ziL 

l*pe^Z  - qepZ  + f 


(p-n) 

We"'2  - e"pz) 


0.1*  - n.°*  ♦ 9(.dz  - .1Z) 


♦ r. 


for  x > z . 


- .Ji 


Subsequently  substitution  of  (28)  Into  (32)  and  (33)  results  In  H( z)  = 0, 
and  It  is  exactly  our  parameter  situations  (29)  and  (30)  that  guarantee 
the  negativity  of  H on  [0,z)  and  the  positivity  of  H on  (z^oo). 

Thus  the  single  critical  number  policy  that  uses  control  mode  1 whenever 
the  state  of  the  system  is  above  level  z and  mode  2 when  the  state  is 
below  z,  where  z is  the  unique  solution  to  the  (28),  is  optimal  for 
the  following  explicit  parameter  combinations: 


r2  > O,  R > 0 
> r2 


[53] 


[341 


2 2 
al  > a2 

P1  - M2  > 0 
2 2 
ul°2  " u2al  - 0 


r2  > 0,  R > 0 

^1R  > r2 

2.2 

°1  >a2 

Ui  > >0 
2 

Hi  ^Hi 

Ho  n2 
<- 


a > 


2 2 

2(VLla2  ~ ^l^  ^1^ 
T2  272 

(a,  - On) 
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A 


I 


r2  > 0,  R < 0 


[391 


and 


-AjR  > r2 

2 2 
°1  >a2 

> 0 > n2 

“l  * ^2 

2 2 

2(ti1a2  - \i2<3-\)  (Uj-Hg) 

a < § 275 

(®x  - 


[1*0] 


r2  > 0,  R < 0 

-AjR  > r2 

2^2 
°1  >ff2 

0 > 


a < 


2 2 

2(^2  - (pj-n2) 

7“2  272 

(°1  - o2) 


There  then  remains  the  case  when  operational  cost  r2  is  zero 
and  operational  cost  r^  is  positive.  The  analysis  is  repititious  of 
that  discussed  above  for  r^  = 0 and  r2  > 0,  and  so  will  be  omitted 
here.  The  results,  however,  are  stated  as  follows.  The  single  band 
policy  of  always  using  control  mode  2 is  optimal  when  the  cost  and 
diffusion  parameters  satisfy 


(34) 


( *!><>,  R>0 

U2>0  , 


loo 


^ > 0,  R > 0 

a2  < 0 

^2R  < rl  > 


r > 0,  R < 0 

A„  < 0 

2 — > 


rl>  0,  R < 0 

a2>0 

■A2r  < TX  . 


(The  explicit  enumeration  of  the  parameter  combinations  that  fall  under 
(54)  through  (37)  could  likewise  be  carried  out  as  was  done  for  combina- 
tions [ 17 ] through  [40].)  For  the  remaining  parameter  situations 


rl  > R > 0 

a2  < 0 

^2R  > ri  > 


rt  > 0,  R < 0 

a2  > 0 
-A2«  > , 


a single  critical  number  policy  is  optimal.  The  optimal  policy  is  to 
use  control  mode  2 whenever  the  state  of  the  system  is  above  level  2 


and  mode  1 when  the  state  is  below  z,  where  z is  the  unique  positive 
solution  to  the  transcendental  equation 


(40) 


4 p[PO-p)e-pZ  - v( v-p)e”vZ] 


+ (R  - §)  O-v)  (p-P)  (v-p)  e'(v+^Z  = 0 , 


and 


v = 


ff 


+ 2o»: 


5.3.  Reflection  and  No  Switching  Costs 

In  this  section  we  solve  the  two  mode  control  problem  with  reflec- 
tion at  the  boundary  and  zero  switching  costs.  That  is,  we  construct  a 
stationary  policy  that  is  optimal  and  likewise  compute  the  associated 
optimal  return  function.  To  emphasize  the  effect  of  linear  holding  costs 
in  the  reflecting  barrier  case,  we  will  assume  that  the  operational  costs 
r^  and  are  equal.  Therefore  as  seen  in  Section  5.2  we  may  set  r^ 

and  rg  at  zero. 

We  first  look  at  the  problem  where  the  linear  holding  costs  are 
incurred  at  a positive  rate,  and  without  loss  of  generality  let  h = 1. 
Again,  we  begin  our  analysis  with  the  simple  band  policies  f^  (always 
use  mode  1)  and  f^  (always  use  mode  2).  For  pdlicy  f ^ and  any  initial 
state  x £ S we  have 


102 


(41)  Vf^(x)  = E^J  e'at  Z1(t|x)dt  + Vf^(0)  E[e"  X]j  , 

where  Z^-lxl  is  (unrestricted)  Brownian  Motion  starting  in  x with 

2 

drift  u,  and  variance  parameter  a and  where  T = inf{t  > 0 : 

1 * x 

Z^(t|x)  = 0).  (Recall  that  in  the  absence  of  switching  costs  the  return 
function  associated  with  any  stationary  policy  is  a function  of  state 
only.)  We  have  already  seen  that 


Et."®*]  = e-pX 


and  that 


E[/  e'at  Z^(  1 1 x)dtj  = £ + -g 


Now  since 

E|/  e 

r 

X 

it  must  be  that 

El  /'  Z 


[7  e-Q!t  Z^(t|x)dt|  = E^e  X/  Z]L(t|o)dtj, 


r t x -at  _ , , 1 X U1  ^1 

U e Zl(t|x)dtJ=a^-^ 


-px 


e } for  all  x € S 


Hence 


(42) 

and  similarly 

(43) 


v*1  ■5*?tK(o)  -7} 


,-p* 


e r f for  all  x £ S } 


-px 


for  all  x € S . 
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We  can  determine  the  value  of  V (0)  and  V,  (0)  by  applying  boundary 

*1  *2 

condition  (36)  from  Theorem  4 in  Chapter  3 to  return  functions  V and 

1 

V respectively.  The  boundary  conditions  here  are 
*2 


(44) 

VI  (0)  = V'  (0)  = 0 , 

1 2 

which  result  in 

(45) 

V£1(0> 

(46) 

for  all  x £ S f 

(47) 

and 

V”  ■ i ♦ 7 ■ 

(48) 

..  , \ x ^2  1 -Px 

vt2<x>  - 3 * -J  * 5 e . 

for  all  x £ S . 

This  leads  to  the  following  relationships  on  S 

(»*9) 

and 

(50) 


Do  (x)  - OVf  (x)  + x = 


(Wg-^l)  L a . 

a + a?3  Al  e 


(u,-u2)  . 

Dl  vfg<*>  ■ Wf2(x)  + x " -V~  + ap  A2  e • 


Therefore  the  (necessary  and  sufficient)  optimality  conditions  are 
satisfied  for  policy  f^  if  and  only  if 


(51) 


j hi  = u2 

K>o ' 
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while  it  is  optimal  to  always  use  mode  2 if  we  have 

[421  al  - °2  * 

2 2 

There  then  remains  the  situation  where  > \i^  and  < c^. 
Under  such  circumstances  we  think  of  mode  1 as  the  "tortoise"  and  mode  2 
as  the  "hare",  and  we  call  our  control  problem  a tortoise-hare  problem. 
This  tortoise-hare  arrangement  suggests  that  we  next  investigate  the 
single  critical  number  policy  f given  by 

/ 1 , if  x G [0,z] 

fz(x,D  = fz(x>2)  = 1 

( 2 , if  x € [z,»). 

Using  the  Markov  arguments  of  Section  5.2  that  led  to  (20)  and 
those  above  that  led  to  (42)  and  (43),  we  see  that 
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IpMei 


(53)  Vf  (*)  »|+  fl  - cp(x)  - *(x)l  - <p(x)  | + <p(x)  Vf  (z)  4.  *(x)  Vf  (0), 

Is  or  z z 

for  x £ [0,zl 

and 

(5*0  V (x)  = ^ + " B(x)  5 + B(x)  vf  (z)  , for  x € [z,»)  f 

z a z 

where 


-VX  -Px 

«*>  - -V.  ' -p.  > 

e - e 

for 

x € [0,z]  , 

-vz-px  -Pz-vx 

*<*>  - -V  ' -B.  - 

for 

x € [0,z]  , 

e - e 

8(x)  = e-DX  , 

for 

x € [z,®)  . 

From  Theorem  4,  Chapter  3 ye  find  that  (x)  exists  everywhere  on  S 

z 

and  is  zero  at  x = 0.  Thus  we  have  the  following  two  equations  in  the 

two  unknowns  (0)  and  V,  (z) 
z z 

(55)  5 + [vf  (•)  - | flP'(O)  * [vt  (0)  - *'(0)  - 0 


(56)  [vfJ«)  - I - ^]  »•(“)  * [v£z«»  • j] 


-[vft(.)-|-^]6.,S)=°  , 
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which  we  solve  for 


P1  r a/  (v-P)z  (P-v)Zv  -pZ,  ' VZ  *PZ\/  VZ  PZ.  ,-1 

(57)  V (0)  = -p  [ vP( ev  K'  + e'K  ' ) + pe  (e  - e K )(ve  - peK  )] 
z a 

• (vp(e(v'0)z  + e(f3_v)z)  + pe'pZ(e’vZ  - e‘pZ)(vevZ  - pePz) 

+ (p-v)  pe"pZ(e^Z  - evZ) ) 

P2  r Q/  (v-P)z  (p-v)zN  -pZ/  -vz  -Pzw  vz  pz.,-1 

- ~ (vP(ev  + eVK  ' ) + pe  ^ (e  - e K )(ve  -peK  )] 

a 

■ (s-v)  0e-°2(epz  - eVZ) 

-i  (v.-pz  - &e"vz) 

♦ oe"pZ( e-vz  - e’Pz)(vevz  - - ^-VZ)]-1 

• (vS(e'vZ  - e'ez)  (e(v"P,z  * . (p.v)( v-e)( e"vz-e'l3z) 

+ pe”pZ( e-vz  - e‘Pz)2  (veVZ-^z))  , 


and 


The  explicit  solution  of  on  S,  then,  is  as  follows 


(59)  Vf  (x)  = £+-4  [e‘vZ  - e'32]'1 


• ((e'vZ  - e'Pz)  + (e'vX  - e*pX)[A(z)-H 

+ (e’vZ-px  - e"Pz"vX) [5( z) _ l] } 

u2  r -V'Z  -£z,- 1 

+ —ri  le  - e K I 

a 

. {( e-vX  - e’Px)  B(z)  - (e-vz^x  - e^Z~vX)  E(z)J 


* l te”vZ  . .-“v1 

• {( e’vx  - e'^x)  Z(z)  ■ (,"”"Sx  . e‘P2'vx)  f(z)l  , 

for  0 < x < z 


(60)  V ( x)  = *■  + —i  e”pX  A(z)  + [1  - e-px  + e’pX  B(z)] 


a £ 


+ -^  e~pX[C( z)-ll  , for  x>z. 


where 


X(z)  = [vP(e(v‘P)z  + e(P_v)z)  + pe'pZ(e'vZ  - e'Pt)  (vevZ  - pe32)!-1 


• v3(  e 


(v-P)z  , J0-vK 


5/-\  r o/  (v-P)z  (B-v)Zx  -pz,  -vz  -pz.  . vz  Q Pz.  ,-1 
B(z)  = [vP( e'  K/  + eVK  r'  ) + pe  H (e  - e K ) (ve  - 0e^  )] 


• pe'pt(e‘vZ  - e"pZ)  (vevZ  - (3e32)  , 
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5(«)  , tvW«(v'S)i  + «(S'V)Z)  * - e’62)  (veVZ  - Ue^ir1 

• (v-8)  (e'vz  - e'pz)  , 

- , . r (v-S)z  (8-v)Z\  -pz,  -vz  -Pz,  . vz  BZv,-l 

D(  z)  = [vP(ev  + e'H  ' ) + pe  (e  - e ) ( ve  - Pe  ) I 

• tve(e(v'Wz  * e(fJ-v)z)  + ce^V*2  - e'®2)  (ve*2  - P.^) 

* O-v)  0e-°2(eB2  - e*z) 1 , 

8(0  = CvP(e(v-B)2  ♦ e(p-*>2  + - e'Bz)  (veVZ  - Pe^)!"1 

• O-v)  oe'02^2  - evz)  , 

and 

f(z)  = tve"^Z  - pe‘vZl'1  (p-v)  . 

As  in  Section  5.2  the  optimality  conditions  now  lead  us  to  choose 

our  critical  number  so  that  v'^  (z-)  = (z+).  Therefore  we  wish  z to 

z z 

satisfy  the  following  equation 

(6i)  - ^|]  ( [e*vZ  - e“P*rl  (Pe‘3Z  - ve‘vZ)  [X(*)-l]  - P?  e’oZ  S(  z) 

La  a J 

+ [e”vZ  - e'pZ]‘l  (v-P)  e‘(v+P)z  [D( z)-ll } 

+ 3 ( [e"vZ  - e-pzrl  (pe_pZ  - ve'VZ)  C(z)  - p2  e’pZ  [C(z)-1] 

+ [e'vZ  - e_Pzrl  (p-v)  e‘(v+P)z  F(z)}  = 0 . 
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Call  the  left  hand  side  of  (61)  F(  z)  where  z € [0^)^  and  let  G be 
the  function  on  [0}oo) 

G(z)  = (p-v)  F( z)  + F ( z)  . 

With  much  pencil  pushing  it  can  be  shown  that  F( 0)  > 0,  lim  F(  z)  = -oo^ 

2 ? f “ 

and  G(  z)  <0  for  all  z £ [0,0°)  when  |i^  > < 0 1 and  a > 

Similarly  F(0)  < 0,  lim  F(  z)  ± +oo,  and  G(  z)  >0  for  all  z £ [0,oo) 

Z t oo 
2 2 

when  and  0 < OC  < Thus  as  argued  in  Section  5*2, 

equation  (6l)  (F(z)  = 0)  has  a unique  positive  solution  z* . Taking 

this  z*  we  substitute  back  into  (59),  (60)  and  the  related  expressions 
S(z»)  through  F(z*).  Let  H be  the  following  function  on  S 


(62) 


H(x)  = D2  V (x)  - D V (x) 
z*  z* 


When  expanded  definition  (62)  becomes 


(63)  H(x) 


r -VZ*  -pz»,-l  5/ 


a 


[e-v-  . §(  z*)  {A4  e"vX  - e-pX) 


+ [e‘vZ*  - e-P**}-1  E(  z*)  (Au  e‘6^-vx  _ ^-yz*^ 

+ 5 [e”vZ*  - C(z*)  {A^  e‘vX  - Ax  e'Px} 

+ i [e“vZ*  . e-Pz*]-1  F(  z*)  {A4  g-P^-voc  _ ^ e-vZ*-pXj 
(m2-mx) 


a 


for  0 < x < z* 
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and 


(6k) 


H(x)  = ---  gl}  S(z*)  A e"pX  +±  [1  - 6(z*)]  A e"pX  + 
a * 


(^2"pi) 

~ir 


for  x > z* 


where  we  have  added  to  the  previous  parameter  functions  A^  Ag 
the  function 


and 


12  2 

AU  = 2 a2  v - 42v  - a . 

2 2 

Inspection  of  (63)  and  (6U)  shows  that  when  ^ > pig  and  o^  < <jg,  we 

have  that  H(  x)  >0  for  x £ [0,z*),  H(z*)  = 0 and  H(x)  <0  for 

x £ (z*>0o).  Thus  we  conclude  that  Dg  V,  (x)  - QVf  (x)  + x > 0 on 

z*  z* 

[0,z*]  and  Dj^  (x)  - CN  ^ ( x)  + x > 0 on  [z*,“>);  and  the  single 

z*  z* 

critical  number  policy  that  uses  control  mode  2 whenever  the  state  of  the 
system  is  above  level  z* f the  unique  solution  to  (6l),  and  uses  control 
mode  1 when  the  state  is  below  z*  is  optimal  for  our  tortoise-hare 
problem. 

If  instead  the  linear  holding  costs  are  incurred  at  a negative 

rate  and  we  re-scale  our  system  so  that  h = -1?  we  find  as  expected 

2 2 

that  the  policy  of  always  using  mode  1 is  optimal  if  o^  > cig,  while 

the  policy  of  always  using  mode  2 is  optimal  only  if  and 

2 2 2 2 
Oj  < <Jg.  Finally  for  the  tortoise-hare  situation  > jig  and  o^  < Og, 

the  optimal  policy  is  a single  critical  number  policy  where  mode  1 is 
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used  whenever  the  state  of  the  system  is  above  the  critical  level  and 
mode  2 is  used  when  the  state  is  below. 


5.4.  Reflection  and  Switching  Costs 

We  would  now  like  to  investigate  the  effect  of  switching  costs  on 
our  solutions  in  the  reflecting  barrier  control  problem.  To  keep  our 

calculations  somewhat  manageable  we  will  specifically  treat  the  tortoise- 

2 2 

hare  situation  (p^  > pg  and  < Og)  and  assume  equal  operational  costs 
(WLOG  r^  = Tg  = 0),  positive  holding  costs  (WLOG  h = 1),  and  a symmetric 
switching  cost  K^g  = Kg^  + K > 0.  Note  that  with  the  addition  of  switch- 
ing costs,  the  return  function  corresponding  to  any  stationary  policy  is 
a function  of  both  state  and  mode. 

As  before,  we  look  first  at  the  simple  band  policies  f^  (always 
use  mode  1)  and  fg  (alwyas  use  mode  2).  Referring  to  (46)  and  (48)  we 


have  that 


„ / .v  x ^1  1 -px 

vf.(x»l)  -o  + “3  + c©e  » 


..  f o\  „ X M1  1 -Px 
Vfi(x,2)  =K+-+^+~e 


„ / , v „ x u2  1 .px 

vf  (x,  l)=K  + a+  2 + Ope  > 
2 a 


V (x  2)  - — e~^X 

f2l  ' ' “ a + a2  + Op  e » 


x e s 


x e s 


x e s ,, 


x £ S . 


Optimality  condition  (13)  of  Theorem  1 in  Chapter  4 is  immediately 


2 2 

likewise  contracits  (72)  for  < 0^.  Therefore  we  have  shown  that  with 
a tortoise  and  hare  alignment  of  control  modes,  it  is  not  optimal  to 
strictly  use  either  mode. 

The  results  of  Section  5*3  suggest  that  we  now  restrict  attention 
to  policies  f#  of  the  following  form 

/ 1 , if  x € [0,Z) 

f*(x,1)  = J 

( 2 , if  x e [Z,co) 

and 

II  , if  x € [0, z] 

2,  if  x e ( z,oo)  , 

where  0 < z < Z < <».  Such  a policy  is  called  a two  critical  numbers 
policy  and  is  characterized  by  the  values  of  the  two  switching  levels  z 
and  Z. 

Suppose  that  z = 
with  the  initial  control 

(73)  vf<x»1>-5 

and 

(74)  V (x,2) 

* 

Since 


0 and  Z = 00.  Then  policy  f will  continue 
mode  forever  and 


**1  1 -px 

+ 2 + 08  6 > 
a ^ 


^2  1 -ox 

V + ape  ' 


for  all  x € S 


for  all  x £ S 
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D1  Vf  (*»l)  ■ 0NfJx>1')  + x P2  - ^3  - a]  ^-e'px  = 0 , x € S 

and 

°2  Vf^(x>2^  ’ + x = [|  4 p2  " P2P  " a]  e_PX  = 0 > X € S, 

optimality  condition  ( Ik)  of  Chapter  k holds  everywhere  for  all  possible 
parameters.  However,  optimality  condition  (13)  of  Chapter  k will  hold 
if  and  only  if  both  of  the  following  are  satisfied. 


(75) 


Vf  (x,l)  - V (x,2)  = 


(nL-M2) 


a 


< K 

aft,  - ’ 


for  all  x € S 


and 


(76) 


V (x,2)  - V (x,l)  * 


a 


_ 3x  ox 
0e  - oe  ^ „ 


for  all  x € S. 


We  evaluate  (75)  and  (76)  separately  for  the  cases  p < & and  p > 3 
and  find  inequalities  (75)  and  (76)  to  hold  everywhere  if 


P < 0 


U1-^2  - a K 


»!-V51?-<A 


or 
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0 > P 


t _u  . a(p-r^),  < a2K 

ll  + pp  - u * 

i2-Hl  < «2k 


hold.  Thus  we  conclude  that  the  policy  of  never  switching  control  mode 
will  be  optimal  if  our  cost  and  diffusion  parameters  fall  into  one  of 
the  following  situations 


*1  >U2 


2.2 
°1  < a2 


p < p 


ii  1*^*2  < a2* 

“2-“l  - i A 


Ml  > 

2 2 
al  < °2 


P > P 


‘l-“2  * 2tS6i  i A 


Suppose  now  that  z = 0 and  0 < Z < ».  Previous  arguments 
(see  Sections  5.2  and  5.5)  then  lead  to 
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[l-«p(x)-\|r(x)l  + cp( x)  -f  + ^ + cp(x)K  + <p(x)  ^ e"M* 

1 + ♦(*)  Vf  (0,1),  x € [0,Z) 


1 -PZ 


(76)  Vf  (x,  1)  =< 


L * 2 _L  -px 

lK  + a+2  + ape  ’ 


x e [z,») 


(77)  V (x,z)  -_Z  + -l+±e-^ 

a 


x e s 


where  cp  and  \|r  are  the  following  functions  on  [0,Z] 


-vx  -Sx 

e - e 

' -vZ  ^3Z  » 

e - e 


,|f(x)  = ^Z 


e-vZ-f3x  e-3Z-vx 


e - e 


Our  characterization  of  return  functions  (Theorem  4,  Chapter  3)  states 
that  vl  (0,1)  = 0,  and  so  we  have 

(78)  v (0,1)  = ^ [t’(O)]'1  [cp'(0)+i|r  '(0)  ] - % [*'(0)]“1  cp'(O) 

* or  a 

- i (V(O)]-1  - KCMr'(O)]-1  q>'(0)  - £ (♦'(0)p]"lq>'(0)e“pZ 


which  when  entered  into  (76)  completes  our  derivation  of  the  return 
function  in  terms  of  critical  number  Z.  (We  have  refrained  here  from 
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writing  out  in  length  all  of  the  exponential- type  expressions  involved.) 

If  such  a policy  f^  is  to  be  optimal,  then  the  optimality  condi- 
tions include  the  requirement  that  D Vf  (x,l)  - 0Vf  ( x, 1)  + x > 0 
for  all  x £ [Z,<x>).  Thus  we  need  that 


a. 


+ A2  ap 


e-px  > cek 


for  all  x £ [Z,») 


which  can  be  the  case  only  if  > a K*  In  order  for 

(x,l)  < K + V,  (x,2)  and  V (x,2)  < K + V (x,  1)  on  S,  we  need 
r*  r*  r* 

satisfy 


|(H.-P2)  H,  , 

(79)  j — -J, [l-Cp(x)]  - \|r(x)  + Kcp(x)  + ^ [q>(x) 


e-pZ  _ e-pX] 


+ V (0,1)  Hr(x) 


< K 


for  all  x £ [0,Z] 


Inspection  of  (79)  shows  that  it  will  hold  true  only  if  p < p,  and  so 
we  anticipate  that  a two  critical  numbers  policy  of  the  form  0 = z < Z < » 
will  be  optimal  only  under  the  following  parameter  situation 


(45) 


!P1  >p2 
2.2 
°1  < °2 

D < P 

Hl-M2  > C^K  . 


Restricting  attention  to  [45)  let  F be  the  function  on  S defined  by 
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— — 

F(Z)  = ^L-H2  + (|  o^o  - n1  - 2)  e"pZ  , Z € [0,»)  . j 

p 

We  have  already  seen  Chat  F(0)  <0  and  lim  F(Z)  = > CTK  and 

Z t oc  2 

therefore  there  uniquely  exists  Z*  > 0 such  that  F(Z*)  ^ or K.  We  now 
compute  explicitly  V,  (0,1)  and  return  function  V on  SxA  using 
this  critical  number  Z* , and  find  that  we  have  strict  inequality  in  (79) 
on  [0,Z*).  The  optimality  conditions  remaining  to  be  verified  are  that 
D V,  (x,l)  - <W  ( x, 1)  + x = 0 for  all  x £ [0,Z*]  and  that 

Dp  Vj  (x,2)  - U!V  (x,2)  + x = 0 for  all  x £ S,  which  are  easily  true 

by  Theorem  4 of  Chapter  3.  Hence  for  parameter  combination  [45]  a two 

critical  numbers  policy  is  optimal  where  z = 0 and  Z £ (0,a>)  is  the 

unique  solution  to  a transcendental  equation. 

We  are  left  with  two  arrangements  of  parameters  to  account  for, 

name ly 


[46] 


and 


[47] 


hi  >d2 


2 2 
°1  < a2 


D > 6 


Ul-U2 


SLa=el  >a2K 

OP 
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If  0 < z < Z < >»,  then  our  return  function  in  general  form  is  given  by 


[l-<p(x)-*(x)  ] - cp(x)  ^ ^ + cp(x)[K+Vf  (Z,2 ) 1 

a * 

+ ♦(*)  vf  (0,1)  , x e [0,z) 


(80)  Vf  (x,l)  = 


K + [l-8(x)]  -§  - &(x)  £ + § + &(x)[K+V  (z,l)]  , 

a * 

k x e [Z,»)  , 


and 


(81)  V (x,2)  = « 


<K+(  l-cp(x)-\|r(x)  ] - <r(x)  ^ ^ + <p(x)[K+Vf  (Z,2)] 

+ *(x)  Vf  (0,1),  x€[0,z] 


[ l-&( x)  ] -f  - B(x)  § + £ + &(x)[K+V  (*,!)],  x € («,-)  , 

a 


where  additionally  6 is  defined  on  [ z,<»)  as 


c / \ -p(x-z) 

8(x)  = e ’ , 


for  all  x > z 


Imposing  our  boundary  condition,  Vi  (0,1)  = 0,  we  then  have  the  following 
three  equations 


(82)  Vf^(  z,  1)  = [ l-<p(  z)-\lr(  z)  ] ^ - <p(z)  | + £ + <p(x)[K+Vf#(Z,2)]++(*)Vf^(0,l), 

(83)  Vf^(Z,  2)  = [1-6(Z)1  ^ - 6(Z)  | +|  + 6(Z)[K+Vf^(z,l))  , 
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and 


(8U)  o = i ♦ «p'(0)[k  + V (z,2)  - f - + ♦•(0)fv  (0,1)  - -^1 

L * a J L * a J 

which  can  be  solved  for  the  unknowns  V,  (0,1),  V,  (z,l),  and  V (Z,2). 

Substitutions  of  these  three  values  into  (80)  and  (81),  then,  leaves 

return  function  Vf  in  terms  of  unspecified  z and  Z. 
r* 

If  we  now  set  out  to  minimize  V ( x, 1)  and  Vf  ( x, z)  over  all 
possible  (z,Z)  pairs  and  for  all  x € S,  we  find  that  when  [46]  and 
[47]  hold  our  task  is  equivalent  to  minimizing  the  following  two  expressions 
with  respect  to  z and  Z, 

(85)  S(z,Z)  = \[e'vZ  - e-0Z  - e_P(z_2)"vZ  + e-p(Z-z)-Pz]  fl  _ y(e-vZ  . ^PZjj 

+ O-v)  [e"vZ  - e"0Z  - e"o(Z*Z)_vZ  + e-P(Z-*)-P*] 

• [1  + e(v'^Z  - e"(  p-v)(Z-z)  j 

+ (p-v)  e*p(Z_z)  [e'vZ  - e"0Z  - e'p(z-z)-vz  + e-p(Z-z)-pzj 

• [1  - e'v(Z"z)  + e“^Z+vZ] J _1 

• |v(e’0Z  - e'vZ)(e‘0Z  - e'vZ)  [e'vZ  - e‘0Z  - e_p(Z’z)‘vZ 

+ e-p(Z-z)-pzj 

+ O-v)  ( e~0z  - e'vZ)[l  + e(v_0)Z  - e(v“p>(Z‘Z> 

+ e-p(Z-z)(l-e"v(Z-^+e-0Z-vZ)]} 

and 
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(86) 


H(z,Z)  = |[e'vZ  - e"pZ  - e"P(z-z)-^  + e_p(Z"z)"pZ][  l-v(e'vZ  - e_f3Z)  ] 
+ 0-v)[e-vZ  - e"PZ  - e-°(Z-Z)-vZ  + 

• [1  + e(v_P)Z  - e’(p'v)(Z'z)] 

+ O-v)  e"p(Z"z)(e'vZ  - e‘pZ  - e_p(Z'z)"vZ  + e-p(z“2)-P2] 

. [1  . e"v(Z_z)  + e_PZ+vZ]|  _1 

• |p(e"vZ  - e“pZ)(e_vZ  - e'pZ)[e'p(Z_Z)‘pZ  - ep(Z’Z)"vZ 

- e’Pz  + e”vZ] 

+ (v-P)(e_vZ  - e_Pz)  [1  - e’P(Z"z)  - eP(Z'z)  + e_vZ+Pz 

+ ep(Z'z)(l  + e(P"v)Z)]}  . 


We  then  investigate  the  following  equations 


(87) 

dG(z.Z) 

az 

(88) 

dG(z.Z) 

az 

(89) 

afi(z.z) 

az 

and 

(90) 

afl(z.z) 

az 

to  find  that  they  will  hold  if  and  only  if 
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(91) 


and 


(92) 


<3z 


c>C(z.Z) 

SZ 


c>H(  z.Z) 
dz  ’ 


dH^Zl 

3Z 


do.  That  is,  G(z,Z)  and  H( z,Z)  have  the  same  local  optima.  With 
parameter  situation  [46]  we  further  have  that 


(93) 

lim  G( z,Z)  = - 

lim  H(  z,Z)  =00 

for 

fixed 

Z , 

z 1 0 

z 1 0 

(9M 

lim  G( z,Z)  = - 

lim  H(z,Z)  = ® 

for 

fixed 

z , 

Z t 00 

Z t ® 

and 

(93) 

lim  G(z,Z)  = - 

lim  H(z,Z)  = -oo  < 

z -»  Z 

z -»  Z 

while  with  [47]  the  results 

are  that 

(96) 

lim  G( z,Z)  = - 

lim  H(  z,Z)  = -00 

for 

fixed 

Z , 

z 1 0 

z l 0 

(97) 

lim  G(z,Z)  = - 

lim  H(z,Z)  = -00 

for 

fixed 

2 , 

Z t * 

Z 1 00 

and 

(98) 

lim  G(z,Z)  = - 

lim  H(  z;Z)  = 00 

z -»  Z 

z -*  Z 
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Therefore  there  exist  z*  and  Z* f where  0 < z*  < Z*  < ■*>,  satisfying 
( 9 1)  and  (92).  Under  [46]  the  uniqueness  of  z*  and  Z*  as  solution 
to  (9I)  and  (92)  follows  since  G(zfZ)  decreases  and  H(zJZ)  increases 
in  z on  (0,Z)  for  fixed  Z}  while  G(z,Z)  increases  and  fi(z)Z) 
decreases  in  Z on  ( zfa>)  for  fixed  z.  Similarly^  if  [47]  holds  then 
G(z,Z)  Increases  and  H( z^Z)  decreases  in  z on  (0,Z)  for  fixed  Z} 
while  G(zfZ)  decreases  and  H(z,Z)  increases  in  Z on  ( zfa>)  for 
fixed  z. 

Letting  our  two  critical  numbers  be  the  unique  positive  solutions 
z*  and  Z*  ( z*  < Z*)  to  (9I)  and  (92) f we  conclude  with  verification 
of  the  necessary  and  sufficient  optimality  conditions.  These  can  be 
summarized  by  the  requirements 


(99) 

(100) 

and 

(101) 


|v  , ( x, 1)  - V (x, z) | < K for  all  x € [ z*fZ*] 


D V (x,l)  - QV  (x,  1)  + x - CKK  > 0 for  all  x £ [0,z*]  , 

C.  Iw 


D V (x,2)  - QV  (x,2)  + x - 0!K  > 0 for  all  x £ [Z*,co) 

1 *■  a 


Upon  substitution  of  (91)  and  (92)  into  (80)  and  (81)  with  parameter 
restrictions  [46]  or  [47],  wa  find  the  following  to  be  true 


(102)  V,  (x,l)  - V (x,2)  increasing  in  x on  [z*}Z*]  , 
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(103) 

(104) 

(103) 

(106) 

and 

( 107) 


Vf  (z*,l)  - V ( z*,  1)  = V (Z*  *)  - V (Z*  1)  = -K  , 

* r* 

1 2 2 

2 (a2”al^  (x>^)  + (l*2”^l)  (X>1)  Increasing  in  x 

on  [0,z*]  , 

\ (4a?)  Vf  (2*,1)  ^ (M2-Ui)  V*  ( z*,  l)  = OK  , 

1 2 2 

2 (ai~a2^  (x>^)  + (x,2)  increasing  in  x 

on  [ Z*  , m) 

f ( V^)  VfJZ*,2)  + (Mj-42)  V-  (Z*,z)  = OK  . 


Statements  (102)  through  (107)  imply  (99)  through  (101),  and  thus  with 
parameter  combinations  [46]  and  [47]  a two  critical  numbers  policy  is 
optimal  where  the  critical  numbers  are  the  unique  positive  pair  to 
simultaneously  solve  two  transcendental  equations. 
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types  of  boundary  behavior  are  considered.  Absorbing  barriers  arise  in 
applications  to  collective  risk  and  insurance,  while  reflecting  barriers 
are  natural  for  problems  in  the  optimal  control  of  queueing  and  storage 
systems. 

When  there  are  only  two  control  modes,  one  expects  an  optimal 
policy  characterized  by  a pair  of  critical  numbers.  For  various  special 
cases,  it  is  shown  that  such  an  optimal  policy  exists,  and  (complicated) 
formulas  for  the  critical  numbers  are  derived. 
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